Imran Ali Imran Ali - 10 months ago 50
R Question

Is it possible to maintain order of ngrams in the output of textcnt function in R?

I am using the

function from
package to obtain bigrams as follows:

sentence <- "A sample sentence in English for testing purpose"
english <- textcnt(sentence, method = "string", n=2, tolower = FALSE)

bigrams returned are in alphabetic order, like this:

A sample English for for testing in English sample sentence sentence in testing purpose

However I am looking for a solution that could return the bigrams in the order as they appear in sentence. To be more exact the desired output is as follows:

A sample sample sentence sentence in in English English for for testing testing purpose

If it is not possible with
is there an alternate to acheive the desired output?

Answer Source


tokenize_ngrams(sentence, n = 2L)
# [[1]]
# [1] "a sample"        "sample sentence" "sentence in"     "in english"      "english for"     "for testing"     "testing purpose"