Learner Learner - 2 months ago 7
R Question

How to find trailing and leading words of a word using R?

I have a text document which has a million words. Now, I need to know how to find trailing and leading words of a word using R.

For example, If I want to find out the words that are coming before and after the word "error". It could be anything like following with leading words

"typo error"
"manual error"
"system error"


and with trailing words like

"error corrected"
"error found"
"error occurred"


Any idea how to do this? Thanks in advance for your inputs.

Answer

For words coming before error:

x <- "no error and no error and some error" # input

library(gsubfn)
rx <- "(\\w+) error"
table(strapplyc(x, rx)[[1]])

giving:

  no some 
   2    1

Replace rx with the following for words after error:

rx <- "error (\\w+)"