TimR TimR - 1 month ago 7
C# Question

How find and remove specific line with next or previous lines in large text document

I'm trying to figure out, how to remove specific string from large text document with 500 000 lines. Find line by content, but at the same time get current line index value in text document order, which must not be disturbed, to remove next or previous line of found line, in other words find closest by index, to remove both for large document. Because any method I've tried with using

File.WriteAllLines
program hangs with such size. I have active requesting to this file and seems like need to find some other way. For example file content is:

1. line 1
2. line 2
3. line 3
4. line 4
5. line 5


and line to find and remove is:

string input = "line 3"


to get this result with removing of found line index and next line index + 1 of next line, if found line index number is odd:

line 1
line 2
line 5


and at the same time be able to remove found line index and index - 1 previous line, if found line index is even number for searching string:

string input = "line 4"


and result should be:

line 1
line 2
line 5


And to know if line is does not exist in the text document.

Write to the same single file.

Answer

If you want to process very large file, the you should use FileStream to avoid loading all of the contents into memory.

To meet your last requirement, you can read the lines two by two. It actually makes your code simpler.

var inputFileName = @"D:\test-input.txt";
var outputFileName = Path.GetTempFileName();

var search = "line 4";

using (var strInp = File.Open(inputFileName, FileMode.Open))
using (var strOtp = File.Open(outputFileName, FileMode.Create))
using (var reader = new StreamReader(strInp))
using (var writer = new StreamWriter(strOtp))
{
    while (reader.Peek() >= 0)
    {
        var lineOdd = reader.ReadLine();
        var lineEven = (string)null;
        if (reader.Peek() >= 0)
            lineEven = reader.ReadLine();

        if(lineOdd != search && lineEven != search)
        {
            writer.WriteLine(lineOdd);

            if(lineEven != null)
                writer.WriteLine(lineEven);
        }
    }    
}

// at this point, operation is sucessfull
// rename temp file with original one
File.Delete(inputFileName);
File.Move(outputFileName, inputFileName);