Azrael Azrael - 2 months ago 19
R Question

How to show corpus text in R tm package?

I'm completely new in R and tm package, so please excuse my stupid question ;-)
How can I show the text of a plain text corpus in R tm package?

I've loaded a corpus with 323 plain text files in a corpus:

src <- DirSource("Korpora/technologie")
corpus <- Corpus(src)

But when I call the corpus with:


I always get some output like this instead of the corpus text itself:

Metadata: 7
Content: chars: 144
Content: chars: 141
Content: chars: 224
Content: chars: 75
Content: chars: 105

How can I show the text of the corpus?


Reproducible sample: I've tried it with the built-in sample text:

> data("crude")
> crude
Metadata: corpus specific: 0, document level (indexed): 0
Content: documents: 20
> crude[1]
Metadata: corpus specific: 0, document level (indexed): 0
Content: documents: 1
> crude[[1]]
Metadata: 15
Content: chars: 527

How can I print the text of the documents?

UPDATE 2: Session Info:

> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] tm_0.6-1 NLP_0.1-7

loaded via a namespace (and not attached):
[1] parallel_3.1.3 slam_0.1-32 tools_3.1.3


You can try converting your corpus text into a dataframe, and accessing the required text from the dataframe itself. I have used the built-in sample data "crude" (from the tm package) as an example.

dataframe<-data.frame(text=unlist(sapply(crude, `[`, "content")), stringsAsFactors=F)

[1] "Diamond Shamrock Corp said that\neffective today it had cut its contract prices for crude oil by\n1.50 dlrs a barrel.\n    The reduction brings its posted price for West Texas\nIntermediate to 16.00 dlrs a barrel, the copany said.\n    \"The price reduction today was made in the light of falling\noil product prices and a weak crude oil market,\" a company\nspokeswoman said.\n    Diamond is the latest in a line of U.S. oil companies that\nhave cut its contract, or posted, prices over the last two days\nciting weak oil markets.\n Reuter"