hyeri hyeri - 5 days ago 4
R Question

Wordcloud of a column in R based on another column

I'm working on wordcloud in R and so far I'm successful with just the basic stuff however what I want to do is I want to show word cloud of specific location. E.g if I have text like

TEXT LOCATION
True or false? link(#Addition, #Classification) NewYork,USA
Gene deFuser: detecting gene fusion events from protein sequences #bmc #bioinformatics Norwich,UK
Biologists do have a sense of humor, especially computational bio people France
Semantic Inference using #Chemogenomics Data for Drug Discovery London,UK


here is the basic wordcloud code I'm using

library(tm)
library(SnowballC)
library(wordcloud)

DATA<-c('True or false? link(#Addition, #Classification) ','Gene deFuser: detecting gene fusion events from protein sequences #bmc #bioinformatics',' Biologists do have a sense of humor, especially computational bio people','Semantic Inference using #Chemogenomics Data for Drug Discovery')
Location<-c('NewYork,USA','Norwich,UK',' France','London,UK')

jeopQ<-data.frame(DATA,Location)

jeopCorpus <- Corpus(VectorSource(jeopQ$DATA))
jeopCorpus <- tm_map(jeopCorpus, content_transformer(tolower))

jeopCorpus <- tm_map(jeopCorpus, removePunctuation)
jeopCorpus <- tm_map(jeopCorpus, PlainTextDocument)
jeopCorpus <- tm_map(jeopCorpus, removeNumbers)
jeopCorpus <- tm_map(jeopCorpus, removeWords, stopwords('english'))
jeopCorpus <- tm_map(jeopCorpus, stemDocument)
myDTM = TermDocumentMatrix(jeopCorpus, control = list(minWordLength = 1))

m = as.matrix(myDTM)

v = sort(rowSums(m), decreasing = TRUE)
set.seed(4363)
wordcloud(names(v), v,max.words =100,min.freq=3,scale=c(4,0.1), random.order = FALSE,rot.per=.5,vfont=c("sans serif","plain"),colors=palette())


I want something like a separate word cloud for Location having "USA" in it and locations having "UK" in it , and a separate wordcloud for FRANCE, is this possible?

Answer
jeopQ<-data.frame(DATA,Location)

# Clean Location
jeopQ$Location <- sub('.*,\\s*','', jeopQ$Location)

# Loop
for(i in unique(jeopQ$Location)){
  jeopCorpus <- Corpus(VectorSource(jeopQ$DATA[jeopQ$Location==i]))
  jeopCorpus <- tm_map(jeopCorpus, content_transformer(tolower))

  jeopCorpus <- tm_map(jeopCorpus, removePunctuation)
  jeopCorpus <- tm_map(jeopCorpus, PlainTextDocument)
  jeopCorpus <- tm_map(jeopCorpus, removeNumbers)
  jeopCorpus <- tm_map(jeopCorpus, removeWords, stopwords('english'))
  jeopCorpus <- tm_map(jeopCorpus, stemDocument)
  myDTM = TermDocumentMatrix(jeopCorpus, control = list(minWordLength = 1))

  m = as.matrix(myDTM)

  v = sort(rowSums(m), decreasing = TRUE)
  set.seed(4363)
  wordcloud(names(v), v,max.words =100,min.freq=3,scale=c(4,0.1), random.order = FALSE,rot.per=.5,vfont=c("sans serif","plain"),colors=palette())  

}

enter image description here enter image description here enter image description here