Mehdi Farhangian Mehdi Farhangian - 3 months ago 11
R Question

Count strings with a certain condition

I have the following dataset

#mydata
Factors Transactions
a,c 2
b 0
c 0
d,a 0
a 1
a 0
b 1


I'd like to count those factors who had transactions.For example, we had two times "a" with transaction. I can write a code to give me my desirable outcome for each variable separately. The following is for "a".

nrow (subset (mydata,mydata$Transaction > 0 & length(mydata[grep("a", mydata$Factors),] )> 0))


But I have too much variables and do not want to repeat a code for all the factors. I would think there should be a way to write a code to give me the results for all of the variables. I wish to have the following out put:

#Output
a 2
b 1
c 1
d 0

Answer

With tidyverse packages, assuming your data is strings/factors and numbers,

library(tidyr)
library(dplyr)

       # separate factors with two elements
df %>% separate_rows(Factors) %>% 
  # set grouping for aggregation
  group_by(Factors) %>% 
  # for each group, count how many transactions are greater than 0
  summarise(Transactions = sum(Transactions > 0))

## # A tibble: 4 x 2
##   Factors Transactions
##     <chr>        <int>
## 1       a            2
## 2       b            1
## 3       c            1
## 4       d            0

You could also avoid dplyr by using xtabs, though some cleaning is necessary to get to the same arrangement:

library(tidyr)

df %>% separate_rows(Factors) %>% 
  xtabs(Transactions > 0 ~ Factors, data = .) %>% 
  as.data.frame() %>% 
  setNames(names(df))

##   Factors Transactions
## 1       a            2
## 2       b            1
## 3       c            1
## 4       d            0

A full base R equivalent:

df2 <- do.call(rbind, 
               Map(function(f, t){data.frame(Factors = strsplit(as.character(f), ',')[[1]], 
                                             Transactions = t)}, 
                   df$Factors, df$Transactions))

df3 <- as.data.frame(xtabs(Transactions > 0 ~ Factors, data = df2))
names(df3) <- names(df)

df3
##   Factors Transactions
## 1       a            2
## 2       b            1
## 3       c            1
## 4       d            0
Comments