Daniel Daniel - 27 days ago 9
SQL Question

How do SQL implementations internally store duplicate records?

I'm trying to better understand how SQL works in practice.
Does it store duplicate records as an additional parameter (denoting multiplicity) to a record? Or, do we simply add another duplicate record to the table? Or, some other method? I'm curious as to what most SQL distributions do.

For example, say we had relation R(a, b, c) with integer attributes we could store it internally as follows:

Integer-a Integer-b Integer-c Multiplicity
1 2 3 2


Integer-a Integer-b Integer-c
1 2 3
1 2 3

Answer Source

SQL does not store duplicate records with a multiplicity (and least no database engine that I'm familiar with).

Each record occupies its own space. Typically, a record is stored on a page, along with other records. These are the units of I/O that the database uses.

That said, there are some caveats. For instance, columnar databases store each row independently. There are compression techniques where adjacent values would be stored with a multiplicity factor. This is not exactly what you are suggesting, but it is along the same lines.

If you want to understand how databases work, I would suggest that you choose one and study up on it. MySQL has a pretty simple model (but is complicated by the fact that the storage layer is separate from the processing layer). Postgres and SQL Server also have reasonable documentation on their page layouts and storage engines.