I need to remove certain terms from a text:
texts
[1] "Lorem ipsum dolor sit amet"
[2] "consectetur adipiscing elit"
stopwords=read.csv("stopwords.txt", encoding = "UTF-8")
stopwords
[1] Lorem
[2] elit
[3] a
texts
[1] "ipsum dolor sit amet"
[2] "consectetur adipiscing"
You mean removeWords
from tm
package?
It works in my case:
texts <- c("Lorem ipsum dolor sit amet", "consectetur adipiscing elit")
stopwords <- c("Lorem","elit", "a")
require("tm")
trimws(removeWords(texts,stopwords))
Output:
[1] " ipsum dolor sit amet"
[2] "consectetur adipiscing "
From @rajnim's answer using trimws
function
gsub
trimws(gsub(paste0("\\b(",paste(stopwords, collapse="|"),")\\b"), "", texts))