Evan Evan - 1 month ago 20
SQL Question

SQL Overhead of a column

Question on Database design. If I have 28 boolean values and have the capability to add them as eithier 28 booleans per row or an integer, which one will be faster? Which approach will keep the table size on disk lowest?

This is working under the assumption that what I need can be done with bitmasking and other binary operations in queries.

Update



Presumably the queries would be much longer if it were an integer. It would be something like this to get the number of occurrences of that bit (in a given range):

SELECT
sum(FirstBit) as CountofFirst,
sum(SecondBit)/2 as CountofSecond,
sum(SecondBit)/4 as CountofThird,
...
FROM
(SELECT
intField & 1 as FirstBit,
intField & 2 as SecondBit,
intField & 4 as ThirdBit
...
FROM myTable
WHERE
intField <> 0)
WHERE
<within given range>

Answer

This is too long for a comment.

First, any row about optimization and data representation needs to mention the database. Each database has its own way of storing such information.

Second, the best representation for data depends on how it is going to be used. Your question gives no indication about that.

Third, database tables do not -- in most databases -- occupy space on disk per se. Tables contain rows. Rows are stored on pages. And pages are stored on disk. The number of pages needed for a table depend on many factors, of which row size is definitely one. But it is not the only one (is the column nullable? is extra space left on each page).

Finally, you ask about minimizing size on disk. That depends on a bunch of factors. However, it is reasonable to assume that in any database, a 4-byte integer is going to have the best scaling properties. In some databases, 28 "bits" (the actual data type might vary) might occupy the same space.