Jim G. Jim G. - 2 months ago 8
R Question

How can I add a column which is a sum of another comma-delimited chr column?

Given this data frame:

> seq <- as.character(c("1, 2, 3", "4, 5", NA, "6"))
> my.df <- data.frame(seq, stringsAsFactors = FALSE)
> str(my.df)
'data.frame': 4 obs. of 1 variable:
$ seq: chr "1, 2, 3" "4, 5" NA "6"
> my.df
seq
1 1, 2, 3
2 4, 5
3 <NA>
4 6


How can I write code to add a column which is a sum of the first column?

seq my.sum
1 1, 2, 3 6
2 4, 5 9
3 <NA> NA
4 6 6
> str(my.df)
'data.frame': 4 obs. of 2 variables:
$ seq : chr "1, 2, 3" "4, 5" NA "6"
$ my.sum: num 6 9 NA 6

Answer

Here is a base R solution where we can strsplit() the seq column to a list of character vectors and then use sapply() to sum up each vector in the list:

my.df$my.sum <- sapply(strsplit(my.df$seq, ", "), function(x) sum(as.numeric(x)))

my.df
#      seq my.sum
#1 1, 2, 3      6
#2    4, 5      9
#3    <NA>     NA
#4       6      6