gulmaily gulmaily - 3 months ago 9
Vb.net Question

Remove Repeated Words from Text file

I have a text file, contaning nearly 45,000 words, one word in each line. Thousands of these words appear more than 10 times. I want to create a new file in which there is no repeated word. I used Stream reader but it reads the file only once. How can I get rid of the repeated words. Please help me. Thanks
My code was like this

Try
File.OpenText(TextBox1.Text)
Catch ex As Exception
MsgBox(ex.Message)
Exit Sub
End Try

Dim line As String = String.Empty
Dim OldLine As String = String.Empty
Dim sr = File.OpenText(TextBox1.Text)

line = sr.ReadLine
OldLine = line

Do While sr.Peek <> -1
Application.DoEvents()
line = sr.ReadLine
If OldLine <> line Then
My.Computer.FileSystem.WriteAllText(My.Computer.FileSystem.SpecialDirectories.Desktop & "\Splitted File without Repeats.txt", line & vbCrLf, True)
End If

OldLine = line
Loop


sr.Close()
System.Diagnostics.Process.Start(My.Computer.FileSystem.SpecialDirectories.Desktop & "\Splitted File without Repeats.txt")
MsgBox("Loop terminated. Stream Reader Closed." & vbCrLf)

Answer

You can use LINQ's Distinct() method for this.

This will work for smaller files:

Dim lines As String() = File.ReadAllLines("yourfile.txt")
File.WriteAllLines("yourfile.txt", lines.Distinct().ToArray())