Ricol - 1 year ago 193
R Question

# Split and unsplit a dataframe in four parts

I'd like to split a dataframe in 4 equals parts, because I'd like to use the 4 cores of my computer.

I did this :

``````df2 <- split(df, 1:4)
unsplit(df2, f=1:4)
``````

and that

``````df2 <- split(df, 1:4)
unsplit(df2, f=c('1','2','3','4')
``````

But the unsplit function did not work, I have these warnings messages

``````1: In split.default(seq_along(x), f, drop = drop, ...) :
data length is not a multiple of split variable
...
``````

Do you have an idea of the reason ?

How many rows in `df`? You will get that warning if the number of rows in your table is not divisible by 4. I think you are using the split factor `f` incorrectly, unless what you want to do is put each subsequent row into a different split data.frame.

If you really want to split your data into 4 dataframes. one row after the other then make your splitting factor the same size as the number of rows in your dataframe using `rep_len` like this:

``````## Split like this:
split(df , f = rep_len(1:4, nrow(df) ) )
## Unsplit like this:
unsplit( split(df , f = rep_len(1:4, nrow(df) ) ) , f = rep_len(1:4,nrow(df) ) )
``````

Hopefully this example illustrates why the error occurs and how to avoid it (i.e. use a proper splitting factor!).

``````## Want to split our data.frame into two halves, but rows not divisible by 2
df <- data.frame( x = runif(5) )
df

## Splitting still works but...
## We get a warning because the split factor 'f' was not recycled as a multiple of it's length
split( df , f = 1:2 )
#\$`1`
#         x
#1 0.6970968
#3 0.5614762
#5 0.5910995

#\$`2`
#         x
#2 0.6206521
#4 0.1798006

Warning message:
In split.default(x = seq_len(nrow(x)), f = f, drop = drop, ...) :
data length is not a multiple of split variable

## Instead let's use the same split levels (1:2)...
## but make it equal to the length of the rows in the table:
splt <- rep_len( 1:2 , nrow(df) )
splt
#[1] 1 2 1 2 1

## Split works, and f is not recycled because there are
## the same number of values in 'f' as rows in the table
split( df , f = splt )
#\$`1`
#         x
#1 0.6970968
#3 0.5614762
#5 0.5910995

#\$`2`
#         x
#2 0.6206521
#4 0.1798006

## And unsplitting then works as expected and reconstructs our original data.frame
unsplit( split( df , f = splt ) , f = splt )
#         x
#1 0.6970968
#2 0.6206521
#3 0.5614762
#4 0.1798006
#5 0.5910995
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download