addons_zz addons_zz - 8 days ago 4
Python Question

How to completely erase the duplicated lines by linux tools as grep, sort, sed, uniq?

How to completely erase the duplicated lines by linux tools as grep, sort, sed, uniq?



This question is really hard to write because I cannot see anything to give meaning to it. But the example is clearly straight. If I have a file like this:

1
2
2
3
4


After to parse the file erasing the duplicated lines, becoming it like this:

1
3
4


I know python or some of it, this is a python script I wrote to perform it. Create a file called
clean_duplicates.py
and run it as:

#
# To run it use:
# python clean_duplicates.py < input.txt > clean.txt
#
import sys

def main():

lines = sys.stdin.readlines()

# print( lines )
clean_duplicates( lines )

def clean_duplicates( lines ):

lastLine = lines[ 0 ]
nextLine = None
currentLine = None
linesCount = len( lines )

# If it is a one lined file, to print it and stop the algorithm
if linesCount == 1:

sys.stdout.write( lines[ linesCount - 1 ] )
sys.exit()

# To print the first line
if linesCount > 1 and lines[ 0 ] != lines[ 1 ]:

sys.stdout.write( lines[ 0 ] )

# To print the middle lines, range( 0, 2 ) create the list [0, 1]
for index in range( 1, linesCount - 1 ):

currentLine = lines[ index ]
nextLine = lines[ index + 1 ]

if currentLine == lastLine:

continue

lastLine = lines[ index ]

if currentLine == nextLine:

continue

sys.stdout.write( currentLine )

# To print the last line
if linesCount > 2 and lines[ linesCount - 2 ] != lines[ linesCount - 1 ]:

sys.stdout.write( lines[ linesCount - 1 ] )

if __name__ == "__main__":

main()


Although, while searching for duplicates lines remove seems to be easier to use tools as grep, sort, sed, uniq:


  1. How to remove duplicate lines inside a text file?

  2. removing line from list using sort, grep LINUX

  3. Find duplicate lines in a file and count how many time each line was duplicated?

  4. Remove duplicate entries using a Bash script

  5. How can I delete duplicate lines in a file in Unix?

  6. How to delete duplicate lines in a file...AWK, SED, UNIQ not working on my file


Answer

You may use uniq with -u/--unique option. As per the uniq man page:

-u / --unique

   Don't output lines that are repeated in the input.
   Print only lines that are unique in the INPUT.

For example:

cat /tmp/uniques.txt | uniq -u

OR, as mentioned in UUOC: Useless use of cat, better way will be to do it like:

uniq -u < /tmp/uniques.txt

Both of these commands will return me value:

1
3
4

where /tmp/uniques.txt holds the number as mentioned in the question, i.e.

1
2
2
3
4