user3655888 user3655888 - 2 months ago 24
R Question

How do I set up TF weight of terms in corpus using the ‘tm’ package in R

I wonder how can I get the term frequency weight in tm packge which is (tf=term/total terms in the document)`

MyMatrix <- DocumentTermMatrix(a, control = list(weight= weightTf))


After I use this weight it shows the frequency of term not TF weight like this

Doc(1) 1 0 0 3 0 0 2
Doc(2) 0 0 0 0 0 0 0
Doc(3) 0 5 0 0 0 0 1
Doc(4) 0 0 0 2 2 0 0
Doc(5) 0 4 0 0 0 0 1
Doc(6) 5 0 0 0 1 0 0
Doc(7) 0 5 0 0 0 0 0
Doc(8) 0 0 0 1 0 0 7

Answer

For example

library(tm)
corp <- Corpus(VectorSource(c(doc1="hello world", doc2="hello new world")))
myfun <-  WeightFunction(function(m) { 
  cs <- slam::col_sums(m) 
  m$v <- m$v/cs[m$j] 
  return(m) 
}, "Term Frequency by Total Document Term Frequency", "termbytot") 
dtm <- DocumentTermMatrix(corp, control = list(weighting = myfun))
inspect(dtm)
# <<DocumentTermMatrix (documents: 2, terms: 3)>>
# Non-/sparse entries: 5/1
# Sparsity           : 17%
# Maximal term length: 5
# 
#     Terms
# Docs     hello       new     world
#    1 0.5000000 0.0000000 0.5000000
#    2 0.3333333 0.3333333 0.3333333
Comments