ale19 ale19 - 4 months ago 12
R Question

R: split concatenated dataframe by row

I have a big text file that is the result of another program appending multiple tables together (including headers). I'd like to read that file into R and split it apart.

This is sort of similar to this problem, except I don't know the row number, and I need to split it on the header row, not a particular row number. I do know that each table starts with a header, though, and that the Wvlgth values always start with 337.0 and end with 823.0.

Here is the text file. It looks identical to this, except with 550 rows.

Wvlgth Global
337.0 .4345
337.5 .1256
338.0 .8754
<...>
821.0 .9923
822.0 .7124
823.0 .2999
Wvlgth Global
337.0 .5632
337.5 .1245
338.0 .0012
<...>
821.0 .1987
822.0 .6743
823.0 .2045


Here is the code to generate something similar in R:

df = data.frame("Wvlgth" = c('337.0','337.5','338.0','821.0','822.0','823.0'),
"Global" = c(.4345, .1256, .8754, .9923, .7124, .2999))


I want this to turn into multiple dataframes, like so:

Dataframe 1



Wvlgth Global
1 337.0 .4345
2 337.5 .1256
3 338.0 .8754
<...>
548 821.0 .9923
549 822.0 .7124
550 823.0 .2999


Dataframe 2



Wvlgth Global
1 337.0 .5632
2 337.5 .1245
3 338.0 .0012
<...>
548 821.0 .1987
549 822.0 .6743
550 823.0 .2045


Not sure if there is a way to do this through read.csv or if i need to read the whole thing in and split it after the fact.

Answer Source

Here's how to do that using split from data.table. Basically, you need to create a group column using cumsum on "Wvlgth" in column 1. Then you can split the result into a list. You can then access the elements of that list like so: df_list[[1]]

df <- read.table(text="Wvlgth Global
337.0  .4345
                337.5  .1256
                338.0  .8754
                821.0  .9923
                822.0  .7124
                823.0  .2999
                Wvlgth Global
                337.0  .5632
                337.5  .1245
                338.0  .0012
                821.0  .1987
                822.0  .6743
                823.0  .2045",header=FALSE,stringsAsFactors=FALSE)
df$group <- cumsum(df[,1]=="Wvlgth")
df_list <- split(df, list(df$group))

$`1`
      V1     V2 group
1 Wvlgth Global     1
2  337.0  .4345     1
3  337.5  .1256     1
4  338.0  .8754     1
5  821.0  .9923     1
6  822.0  .7124     1
7  823.0  .2999     1

$`2`
       V1     V2 group
8  Wvlgth Global     2
9   337.0  .5632     2
10  337.5  .1245     2
11  338.0  .0012     2
12  821.0  .1987     2
13  822.0  .6743     2
14  823.0  .2045     2

To access a single data.frame:

df_list[[1]]
      V1     V2 group
1 Wvlgth Global     1
2  337.0  .4345     1
3  337.5  .1256     1
4  338.0  .8754     1
5  821.0  .9923     1
6  822.0  .7124     1
7  823.0  .2999     1

Additionally, if you want to set up the column names of the data.frames, you can use lapply:

new_col_name <- c("Wvlgth", "Global","group")
df_list <- lapply(df_list, setNames, nm = new_col_name) #set names
df_list <- lapply(df_list, function(x) x[-1,]) #remove first row

> df_list
$`1`
  Wvlgth Global group
2  337.0  .4345     1
3  337.5  .1256     1
4  338.0  .8754     1
5  821.0  .9923     1
6  822.0  .7124     1
7  823.0  .2999     1

$`2`
   Wvlgth Global group
9   337.0  .5632     2
10  337.5  .1245     2
11  338.0  .0012     2
12  821.0  .1987     2
13  822.0  .6743     2
14  823.0  .2045     2