Hardik Hardik - 1 year ago 79
R Question

text2vec: Iterate over the vocabulary after using function create_vocabulary

Using text2vec package, I created a vocabulary.

vocab = create_vocabulary(it_0, ngram = c(2L, 2L))

vocab looks something like this

> vocab
Number of docs: 120
0 stopwords: ...
ngram_min = 2; ngram_max = 2
terms terms_counts doc_counts
1: knight_severely 1 1
2: movie_expect 1 1
3: recommend_watching 1 1
4: nuke_entire 1 1
5: sense_keeping 1 1
14467: stand_idly 1 1
14468: officer_loyalty 1 1
14469: willingness_die 1 1
14470: fight_bane 3 3
14471: bane_beginning 1 1

How can I check the range of the column terms_counts? I need this because it will be helpful for me during pruning which is my next step

pruned_vocab = prune_vocabulary(vocab, term_count_min = <BLANK>)

Below code is reproducible


text <- c(" huge fan superhero movies expectations batman begins viewing christopher
nolan production pleasantly shocked huge expectations dark knight christopher
nolan blew expectations dust happen film dark knight rises simply big expectations
blown production true cinematic experience behold movie exceeded expectations terms
action entertainment",
"christopher nolan outdone morning tired awake set film films genuine emotional
eartbeat felt flaw nolan films vision emotion hollow bought felt hero villain
alike christian bale typically brilliant batman felt bruce wayne heavily embraced
final installment bale added emotional depth character plot point astray dark knight")

it_0 = itoken( text,
tokenizer = word_tokenizer,
progressbar = T)

vocab = create_vocabulary(it_0, ngram = c(2L, 2L))

Answer Source

Try range(vocab$vocab$terms_counts)

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download