Robottinosino Robottinosino - 5 years ago 381
Bash Question

Unix: removing duplicate lines without sorting

I have a utility script in Python:

#!/usr/bin/env python
import sys
unique_lines = []
duplicate_lines = []
for line in sys.stdin:
if line in unique_lines:
duplicate_lines.append(line)
else:
unique_lines.append(line)
sys.stdout.write(line)
# optionally do something with duplicate_lines


This simple functionality (uniq without needing to sort first, stable ordering) must be available as a simple UNIX utility, mustn't it? Maybe a combination of filters in a pipe?

Reason for asking: needing this functionality on a system on which I cannot execute python from anywhere

Answer Source

The UNIX Bash Scripting blog suggests:

awk '!x[$0]++'

This command is simply telling awk which lines to print. The variable $0 holds the entire contents of a line and square brackets are array access. So, for each line of the file, the node of the array x is incremented and the line printed if the content of that node was not (!) previously set.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download