MFR MFR - 29 days ago 9
R Question

Loop over strings in r

I'd like to know what is wrong with my code rather than a solution. I wish to loop over some strings my data is as follows:

id source transaction

1 a > b 6 > 0
2 J > k 5
3 b > c 4 > 0


I have a list and wish to go over this list and find the rows that contains that element and compute average.

mylist <- c ("a", "b")


So my desired output will for one of the element in the list is

source avg
a 6
b 2


I do not know who to loop over the list and send them to a csv file. I tried this

mylist <- c( "a", "b" )

for(i in mylist)
{

KeepData <- df [grepl(i, df$source), ]
KeepData <- cSplit(KeepData, "transaction", ">", "long")

avg<- mean(KeepData$transactions)
result <- list(i,avg )

write.table(result ,file="C:/Users.csv", append=TRUE,sep=",",col.names=FALSE,row.names=FALSE)

}


But It gives me "NA" result with the following warning


Warning messages: 1: In mean.default(KeepData$transactions) :

argument is not numeric or logical: returning NA 2: In
mean.default(KeepData$transactions) : argument is not numeric or
logical: returning NA

Answer

We can use cSplit to split the 'source' and convert the dataset to 'long' format, then specify the 'i', grouped by 'source', get the mean of 'transaction' (using data.table methods)

library(splitstackshape)
cSplit(df1, "source", " > ", "long")[source %in% mylist, .(avg = mean(transaction)), source]
#   source avg
#1:      a   6
#2:      b   5

Or another option is separate_rows from tidyr to convert to 'long' format, then use the dplyr methods to summarise after grouping by 'source'

library(tidyr)
library(dplyr)
separate_rows(df1, source) %>%
        filter(source %in% mylist) %>%
        group_by(source) %>% 
        summarise(avg  = mean(transaction))

Update

For the new dataset ('df2'), we need to split both the columns to 'long' format, and then get the mean of 'transaction' grouped by 'source'

cSplit(df2, 2:3,  " > ", "long")[source %in% my_list, .(avg = mean(transaction)), source]
#   source avg
#1:      a   6
#2:      b   2

The for loop can be modified to

for(i in mylist) {
   KeepData <-  cSplit(df2, 2:3,  ">", "long")
   KeepData <- KeepData[grepl(i, source)]
   avg<- mean(KeepData$transaction)
   result <- list(i,avg )
   print(result)
   write.table(result ,file="C:/Users.csv", 
             append=TRUE,sep=",",col.names=FALSE,row.names=FALSE)
 }
#[[1]]
#[1] "a"

#[[2]]
#[1] 6

#[[1]]
#[1] "b"

#[[2]]
#[1] 2

data

df1 <- structure(list(id = 1:3, source = c("a > b", "J > k", "b > c"
 ), transaction = c(6L, 5L, 4L)), .Names = c("id", "source", "transaction"
), class = "data.frame", row.names = c(NA, -3L))


df2 <- structure(list(id = 1:3, source = c("a > b", "J > k", "b > c"
), transaction = c("6 > 0", "5", "4 > 0")), .Names = c("id", 
"source", "transaction"), class = "data.frame", row.names = c(NA, 
-3L))