Iterator - 1 year ago 75
R Question

# Sizes of integer vectors in R

I had thought that R had a standard overhead for storing objects (24 bytes, it seems, at least for integer vectors), but a simple test revealed that it's more complex than I realized. For instance, taking integer vectors up to length 100 (using random sampling, hoping to avoid any sneaky sequence compression tricks that might be out there), I found that different length vectors could have the same size, as follows:

``````> N   = 100
> V   = vector(length = 100)
> for(L in 1:N){
+     z = sample(N, L, replace = TRUE)
+     V[L]    = object.size(z)
+ }
>
> options('width'=88)
> V
[1]  48  48  56  56  72  72  72  72  88  88  88  88 104 104 104 104 168 168 168 168
[21] 168 168 168 168 168 168 168 168 168 168 168 168 176 176 184 184 192 192 200 200
[41] 208 208 216 216 224 224 232 232 240 240 248 248 256 256 264 264 272 272 280 280
[61] 288 288 296 296 304 304 312 312 320 320 328 328 336 336 344 344 352 352 360 360
[81] 368 368 376 376 384 384 392 392 400 400 408 408 416 416 424 424 432 432 440 440
``````

I'm very impressed by the
`152`
values that shows up (observation: 152 = 128 + 24, though 280 = 256 + 24 isn't as prominent). Can someone explain how these allocations arise? I have been unable to find a clear definition in the documentation, though V cells come up.

Even if you try N <- 10000, all values occur exactly twice, except for vectors of length :

• 5 to 8 (56 bytes)
• 9 to 12 (72 bytes)
• 13 to 16 (88 bytes)
• 17 to 32 (152 bytes)

The fact that the number of bytes occurs twice, comes from the simple fact that the memory is allocated in pieces of 8 bytes (referred to as Vcells in `?gc` ) and integers take only 4 bytes.

Next to that, the internal structure of objects in R makes a distinguishment between small and large vectors for allocating memory. Small vectors are allocated in bigger blocks of about 2Kb, whereas larger vectors are allocated individually. The ‘small’ vectors consist of 6 defined classes, based on length, and are able to store vector data of up to 8, 16, 32, 48, 64 and 128 bytes. As an integer takes only 4 bytes, you have 2, 4, 8, 12, 16 and 32 integers you can store in these 6 classes. This explains the pattern you see.

The extra number of bytes is for the header (which forms the Ncells in `?gc`). If you're really interested in all this, read through the R Internals manual.

And, as you guessed, the 24 extra bytes are from the headers (or Ncells ). It's in fact a bit more complicated than that, but the exact details can be found in the R internals manual

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download