Doug Fir Doug Fir - 1 month ago 12
R Question

Split a list into smaller parts for transformations (hoping to work around memory limitations)

There are several posts on splitting up a data frame into pieces e.g. here and here.

I have a corpus of text data and as I understand it a corpus is a list. I'm struggling to run transformations on my corpus so wanted to try splitting it into pieces to loop over transformations rather than on the entire corpus at once.

> length(corpus)
[1] 1000 # sample small 1k corpus

> typeof(corpus)
[1] "list"

pieces <- split(corpus, 10)

My goal is to get a list of lists of length 100 each but after running the above line using
pieces has length one and appears to have retained only the first document in the original data corpus.

How can I split my corpus into 10 parts like in the linked SO posts using e.g. split or another method?

aku aku
Answer Source

It looks like the second argument of split() should be a vector. Have you tried pieces <- split(corpus, 1:10)?