Marko Marko - 3 months ago 25x
SQL Question

Storing sex (gender) in database

I want to store a users gender in a database with as little (size/performance) cost as possible.

So far, 3 scenarios come to mind

  1. Int - aligned with Enum in code (1 = Male, 2 = Female, 3 = Hopefully no need for this?)

  2. char(1) - Store m or f

  3. Bit (boolean) - and call the column isMale (sorry ladies :p)?

No sexist offense intended with option 3 :-)

The reason I ask is because of this answer which mentions that chars are smaller than booleans.


I should clarify that I'm using MS SQL 2008, which DOES in fact have the bit datatype.

Edit #2

A few funny answers around more than 2 genders and using the isMale column. I hope I don't get flagged for writing this but in our current (modern) society, I guess the most appropriate name for a boolean (gender) column would be

Edit #3

Turns out this question is quite subjective since people have suggested all 3 options in their answers below. I'll let the community upvote what they think is the best solution, because I don't know which answer to accept yet.

Edit #4

I'm starting to lean towards @OMG Ponies answer, mainly because he has higher Rep and a gold badge for the sql-server tag - it looks like he knows what he's talking about.

Come on people, vote! :)


I'd call the column "gender".

Data Type   Bytes Taken          Number/Range of Values
TinyINT     1                    255 (zero to 255)
INT         4            -       2,147,483,648 to 2,147,483,647
BIT         1 (2 if 9+ columns)  2 (0 and 1)
CHAR(1)     1                    26 if case insensitive, 52 otherwise

If there's a need to support more than two genders, the BIT data type can be ruled out for the sake it can't support them. While INT supports more than two options, it takes 4 bytes -- performance will be better with a smaller/more narrow data type.

CHAR(1) has the edge over TinyINT - both take the same number of bytes, but CHAR provides a more narrow number of values. Using CHAR(1) would make using "m", "f",etc natural keys, vs the use of numeric data which are referred to as surrogate/artificial keys. CHAR(1) is also supported on any database, should there be a need to port.


I would use Option 2: CHAR(1).


An index on the gender column likely would not help because there's no value in an index on a low cardinality column. Meaning, there's not enough variety in the values for the index to provide any value.