Dogus Ural Dogus Ural - 3 months ago 10
C Question

What is data alignment? Why and when should I be worried when typecasting pointers in C?

I couldn't find a decent document that explains how the alignment system works and why some types are more strictly aligned than the others.

Answer

I'll try to explain in short.

What is data alignment?

The architecture in you computer is composed by processor and memory. Memory is organized in cells, so:

 0x00 |   data  |  
 0x01 |   ...   |
 0x02 |   ...   |

Each memory cell has a specified size, amount of bits it can store. This is architecture dependent.

When you define a variable in your C/C++ program, one or more different cells are occupied by your program.

For example

int variable = 12;

Suppose each cell contains 32 bit and a int type size is of 32 bit, then in somewhere in your memory:

variable: | 0 0 0 c |  // c is hexadecimal of 12.

When your CPU has to operate on that variable it need to bring it inside its register. A CPU can take in "1 clock" a small amount of bit from the memory, that size is usually called WORD. This dimension is architecture dependent as well.

Now suppose you have a variable which is stored, because of some offset, in two cell.

For example I have two different data to store (I'm going to use a "string representation to make more clear"):

data1: "ab"
data2: "cdef"

So the memory will composed in that way (2 different cell):

|a b c d|     |e f 0 0|

That is, data1 occupies just half of the cell, so data2 occupies the remain part and a part of a second cell.

Now suppose you CPU want to read data2. The CPU need 2 clock in order to access to the data, because with one clock it reads the first cell and with the other clock it reads the remain part in the second cell.

If we align data2 in accordance with this memory-example, we can introduce a sort of padding and shift data2 all in the second cell.

|a b 0 0|     |c d e f|
     ---
   padding

In that way the CPU will lose only "1 clock" in order to access to data2.

What an align system does

An align system just introduces that padding in order to align the data with the memory of the system, remember in accordance with the architecture. When the data are aligned in the memory you don't waste CPU cycles in order to access to the data.

This is done for performance reason (99% of times).