zena retha zena retha - 12 days ago 8
C# Question

Delete stopwords from text file in C#

I read two text files : the first contains Arabic text , I split it. The second contains the stop-words.
I want to delete any stop-words (in the second file) from the first file, but I don't know how to do this:

FileStream fs = new FileStream(@"H:\\arabictext.txt", FileMode.Open);
StreamReader arab = new StreamReader(fs,Encoding.Default,true);
string artx = arab.ReadToEnd();
richTextBox1.Text = artx;
arab.Close();
char[] dele = {' ', ',', '.', '\t', ';','#','!' };

string[] words = richTextBox1.Text.Split(dele);

FileStream fsw = new FileStream("H:\\arab.txt", FileMode.Create);
StreamWriter arabw = new StreamWriter(fsw,Encoding.Default);

foreach (string s in words)
{
arabw.WriteLine(s);
}

Ali Ali
Answer

If I understand you correctly, you want to find stop-words from the first file and remove those stop-words from the second file.

Here is my workaround:

  1. Extract stop-words by split method from the first file
  2. Iterate extracted words from the first file and replace them with String.Empty in the content of 2nd file.
  3. Save the file

I simplified your code into the code below:

        // read file contents
        var fileContent1 = System.IO.File.ReadAllText("file1.txt");
        var fileContent2 = System.IO.File.ReadAllText("file2.txt");

        // extract stop-words from first file
        var words = fileContent1.Split(new char[] { ' ', ',', '.', '\t', ';', '#', '!' })
                                .Distinct();

        // rmeove stop words in file2
        foreach (var word in words)
            fileContent2.Replace(word, string.Empty);

        System.IO.File.WriteAllText("file2.txt", fileContent2);