Imran Ali Imran Ali - 1 month ago 7
R Question

Is it possible to maintain order of ngrams in the output of textcnt function in R?

I am using the

textcnt()
function from
tau
package to obtain bigrams as follows:

sentence <- "A sample sentence in English for testing purpose"
english <- textcnt(sentence, method = "string", n=2, tolower = FALSE)


bigrams returned are in alphabetic order, like this:

A sample English for for testing in English sample sentence sentence in testing purpose


However I am looking for a solution that could return the bigrams in the order as they appear in sentence. To be more exact the desired output is as follows:

A sample sample sentence sentence in in English English for for testing testing purpose


If it is not possible with
textcnt()
is there an alternate to acheive the desired output?

Answer

Try

library(tokenizers)
tokenize_ngrams(sentence, n = 2L)
# [[1]]
# [1] "a sample"        "sample sentence" "sentence in"     "in english"      "english for"     "for testing"     "testing purpose"