Tarek Khedr Tarek Khedr - 1 month ago 6
R Question

Split Column of mulitiple variables into 2 (Not key and Value)

I have this data frame tk which is a subset of my original data

tk

> ## document term count sentiment
> ## 1 111 happen 1 anticipation
> ## 2 111 time 1 anticipation
> ## 3 112 mother 1 anticipation
> ## 4 112 mother 1 joy
> ## 5 112 mother 1 negative
> ## 6 112 mother 1 positive
> ## 7 112 mother 1 sadness
> ## 8 112 mother 1 trust
> ## 9 112 sue 1 anger
> ## 10 112 sue 1 negative
> ## 11 112 sue 1 sadness
> ## 12 112 wrong 1 negative
> ## 13 113 suck 1 negative
> ## 14 114 gate 1 trust


I need to


  • add a new column (tk$positive_negative) to contain values "positive" and "negative" only from the sentiment variable.

  • add another new column (tk$emotions) to contain any other value except "positive" and "negative" from also the sentiment variable.



I have tried for loop but i couldn't succeed

for (i in tk$sentiment){
ifelse(i=="positive",tk$positive_negative<-"positive",ifelse(i=="negative",tk$positive_negative<-"negative",tk$emotions<-paste(print(i))))
}

> ## [1] "anticipation"
> ## [1] "anticipation"
> ## [1] "anticipation"
> ## [1] "joy"
> ## [1] "sadness"
> ## [1] "trust"
> ## [1] "anger"
> ## [1] "sadness"
> ## [1] "trust"

tk

> ## document term count sentiment emotions positive_negative
> ## 1 111 happen 1 anticipation trust negative
> ## 2 111 time 1 anticipation trust negative
> ## 3 112 mother 1 anticipation trust negative
> ## 4 112 mother 1 joy trust negative
> ## 5 112 mother 1 negative trust negative
> ## 6 112 mother 1 positive trust negative
> ## 7 112 mother 1 sadness trust negative
> ## 8 112 mother 1 trust trust negative
> ## 9 112 sue 1 anger trust negative
> ## 10 112 sue 1 negative trust negative
> ## 11 112 sue 1 sadness trust negative
> ## 12 112 wrong 1 negative trust negative
> ## 13 113 suck 1 negative trust negative
> ## 14 114 gate 1 trust trust negative


Please advice, thank you

Answer

See the comment by @Sotos. ifelse is already vectorized which basically means it already applies the function to every element in the vector for you. So no need for a loop! Also, using vectorized functions is much faster than a non-vectorized approach.

With that said I think to solve your problem all you need to do is:

tk$positive_negative <- ifelse(tk$sentiment %in% c("positive","negative"),tk$sentiment,"")
tk$emotions <- ifelse(tk$sentiment %in% c("positive","negative"),"",tk$sentiment)

tk
   document   term count    sentiment positive_negative     emotions
1       111 happen     1 anticipation                   anticipation
2       111   time     1 anticipation                   anticipation
3       112 mother     1 anticipation                   anticipation
4       112 mother     1          joy                            joy
5       112 mother     1     negative          negative             
6       112 mother     1     positive          positive             
7       112 mother     1      sadness                        sadness
8       112 mother     1        trust                          trust
9       112    sue     1        anger                          anger
10      112    sue     1     negative          negative             
11      112    sue     1      sadness                        sadness
12      112  wrong     1     negative          negative             
13      113   suck     1     negative          negative             
14      114   gate     1        trust                          trust

Data:

    tk <- structure(list(document = c(111L, 111L, 112L, 112L, 112L, 112L, 
112L, 112L, 112L, 112L, 112L, 112L, 113L, 114L), term = structure(c(2L, 
6L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 7L, 4L, 1L), .Label = c("gate", 
"happen", "mother", "suck", "sue", "time", "wrong"), class = "factor"), 
    count = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L), sentiment = c("anticipation", "anticipation", "anticipation", 
    "joy", "negative", "positive", "sadness", "trust", "anger", 
    "negative", "sadness", "negative", "negative", "trust")), .Names = c("document", 
"term", "count", "sentiment"), row.names = c(NA, -14L), class = "data.frame")
Comments