Francesco Francesco - 2 months ago 6
R Question

New dataframe cointaning sum of column of another, R

I have 32 dataframes, I need to obtain for every dataframe a new dataframe containing the sum of some of other dataframes' column.

Let me write an example with 2 dataframes to be more clear:

df1 <- data.frame(1:5,2:6,3:7, 4:8)
colnames(df1) <- c("one", "two", "three", "four")
df2 <- data.frame(4:8, 5:9, 6:10, 7:11)
colnames(df2) <- c("one", "two", "three", "four")


What I would like to obtain is a dataframe
df1a
, in which column 1 is the sum of columns 1 and 3 of dataframe
df1
, and column 2 is the same, not changing. Also I would like that column 4, in the output is placed first.

I know I can write this code:

df1a <- data.frame(df1$four, df1$one+df1$three, df1$two )
colnames(df1a) <- c("four", "1+3", "two")


But It seems to me very long to write for every dataframe, since in my real data I have 32 dataframes made of 20 columns each.

I put them in a list:

listdf <- list(df1, df2)


I think I have to apply some loop or something with apply, but I can't figure how.

An example of what I would like to obtain from df1 to df1a:

df1
one two three four
1 1 2 3 4
2 2 3 4 5
3 3 4 5 6
4 4 5 6 7
5 5 6 7 8

df1a <- data.frame(df1$four, df1$one+df1$three, df1$two )
colnames(df1a) <- c("four", "1+3", "two")
df1a
four 1+3 two
1 4 4 2
2 5 6 3
3 6 8 4
4 7 10 5
5 8 12 6

Answer

See comments in the code. In essence, you write a function which should be performed on each data.frame and use it lapply or sapply to perform this operation on each data.frame. Since you put these data.frames into a list, use of lapply or sapply is very convenient.

df1 <- data.frame(1:5,2:6,3:7, 4:8)
colnames(df1) <- c("one", "two", "three", "four")
df2 <- data.frame(4:8, 5:9, 6:10, 7:11)
colnames(df2) <- c("one", "two", "three", "four")

# Create a function which holds commands to be used on a single data.frame
operationsPerDF <- function(x) {
  data.frame(four = x$four, onepthree = x$one + x$three, two = x$two)
}

# You can manually gather data.frames into a list.
lapply(list(df1, df2), FUN = operationsPerDF)

# Or find data.frames by a pattern, collect them into a list...
list.dfs <- sapply(ls(pattern = "df"), get, simplify = FALSE)

# ... and perform the above operation, one data.frame at a time
lapply(list.dfs, FUN = operationsPerDF)

$df1
  four onepthree two
1    4         4   2
2    5         6   3
3    6         8   4
4    7        10   5
5    8        12   6

$df2
  four onepthree two
1    7        10   5
2    8        12   6
3    9        14   7
4   10        16   8
5   11        18   9
Comments