I'm currently working on a project that reads a large file or rather multiple files with > millions of lines. To do so I use Streamreader to read each line.
Every line is checked if it includes a certain string. When the condition is true, I'll add a row. I have to reproduce the code from memory since I haven't the code in front of me:
Table table = new Table();
Row row = new Row();
Cell cell = new Cell();
using(Streamreader sr = new Streamreader(file))
while((str = sr.ReadLine()) != null)
row = table.AddRow();
cell = row.Cells
cell = row.Cells // actually I use a counter variable, cause my table has 6 cells consistent.
You're doing the right thing by reading your input files from streams line-by-line. That means only the current line of each input file needs to be present in your RAM.
But, you're doing the wrong thing by putting a row into your Table object for each line matching the marker. Those Table objects live in RAM. Attempts to create Table objects with millions upon millions of Row objects will use up your RAM, as you have discovered.
The dotnet collection classes do a good job of supporting vast collections. But there's no magic around the use of RAM.
You need to figure out a way to limit the number of Row objects in a Table object. Can you keep track of the row count, and when it reaches a certain number (who knows how big? 10K? 100K?) write the table to disk and create a new one?
Also, it seems that Migradoc generates PDF files. Is a million-page pdf file a useful object? It seems unlikely.