Jaimin Soni Jaimin Soni - 11 days ago 5
R Question

Can't generate word cloud by cluster number using R

I am trying to generate a word cloud by cluster but it gives error x must be an array of atleast two dimensions, I am using twitter data -> corpus -> textminig -> document term matrix -> kmeans clustering -> word cloud by each cluster.

library(tm)
library(SnowballC)
library(XML)
library(streamR)
library(wordcloud)
library(NLP)
library(fpc)
library(cluster)

tweetsDF <- parseTweets('tweetsStream.txt', simplify = FALSE)
names(tweetsDF)

corp = Corpus(VectorSource(tweetsDF$text))
inspect(corp[1:1])

corp = Corpus(VectorSource(corp))
dtm = DocumentTermMatrix(corp)
inspect(dtm)

tdm = TermDocumentMatrix(corp)

freq = colSums(as.matrix(dtm))
length(freq)

freq= sort(colSums(as.matrix(dtm)), decreasing = TRUE)
head(freq, 14)

d= dist(t(dtm), method="euclidian")
kfit <- kmeans(d, 2)
clusplot(as.matrix(d), kfit$cluster, color=T, shade=T, labels=2, lines=0)

docs1 = names(which(kfit$cluster ==2))
docs1 = as.matrix(docs1)
v1= sort(colSums((docs1)), decreasing= TRUE)


error x must be an array of at least two dimension

myNames1 = names(v1)
d1 = data.frame(word=myNames1, freq=v1)
wordcloud(d1$word, d1$freq, min.freq=2)


output of dput

Answer

You are not collecting the term data after clustering to determine the word clouds....

What you what should be something like this:

library(slam)

docs1 <- which(kfit$cluster ==2)
head(docs1); length(docs1)
docs1 <- tdm[docs1, ]
head(docs1)
d1 <- data.frame(word=rownames(docs1), freq=row_sums(docs1))
head(d1)
d1 <- d1[order(d1$freq), ]
wordcloud(d1$word, d1$freq, min.freq=2)

Minimal example:

Using some built in data I have done kmeans clustering and generated a wordcloud based on one of the clusters

library(tm)
library(wordcloud)
library(slam)

data("acq")

dtm = DocumentTermMatrix(acq)
inspect(dtm)

tdm <- TermDocumentMatrix(acq)

freq = colSums(as.matrix(dtm))
length(freq)

freq= sort(colSums(as.matrix(dtm)), decreasing = TRUE)
head(freq, 14)

d= dist(t(dtm), method="euclidian")
kfit <- kmeans(d, 2)
clusplot(as.matrix(d), kfit$cluster, color=T, shade=T, labels=2, lines=0)

docs1 <- which(kfit$cluster ==2)
head(docs1); length(docs1)
docs1 <- tdm[docs1, ]
inspect(docs1)
d1 <- data.frame(word=rownames(docs1), freq=row_sums(docs1))
head(d1)
d1 <- d1[order(d1$freq), ]
wordcloud(d1$word, d1$freq, min.freq=2)

As a side note: posting an image of your dput statement doesn;t help as we cannot use this to generate your data on our machines.