atomi kise atomi kise - 3 years ago 195
R Question

R removewords() does not work

I am a novice in R , and I am looking for a way to delete some english words using stopwords

here the function I have made :

cleanfunction <- function(test) {
test <-removeWords(test,stopwords("en"))
test<-gsub("\\b[A-z]\\b{1}"," ",test)
test<-gsub("\\W"," ",test)
test<-gsub("\\d"," ",test)

return (test)

Mdatasub2 <-aggregate(Reviews ~ Product.Name,data =Mdatasub2,FUN=cleanfunction)

The things is , it does not delete "the" ,"just" ,"this" "got"

Thanks in advance

Answer Source

You need to do some changes in your code. You need tm library and tm_map function as below:


cleanfunction  <- function(test) {

    ## You can use tm_map but I am keeping gsub function
    test <-gsub("\\b[A-z]\\b{1}"," ",test)
    test <-gsub("\\W"," ",test)
    test <-gsub("\\d"," ",test)

    # You need to convert your vector to corpus  
    myCorpus <- Corpus(VectorSource(test))

    ## You can add any words that you would like to exclude in myStopwords. 
    ## stopwords("english") have some default word list that it would exclude from corpus but not all common words. So, myStopwords will help you to remove certain words that you wish to remove

    myStopwords <- c("got", "just", "this", "the")
    myCorpus <- tm_map(myCorpus, removeWords, c(myStopwords, stopwords("english"))) 

    ## Stripping extra white space    
    test <- tm_map(myCorpus, stripWhitespace)

return (test)


For more information on tm_map, you can use ?tm_map and look at the documentation.

