Alex Skl Alex Skl - 6 months ago 29
Linux Question

How to display duplicates from a text file using awk

I'm trying to find out how to use the "awk" command, in order to display a word that shows up multiple times in a file(txt). In addition, how can you display the name of this/those file/s?

ex: first sentence first file.
Second sentence followed by the second word.

This should display: "first" and "second"

Answer Source

I assume with -i you mean comparison / counting should be ignoring case.

If I understand your requirements correctly an command like this should work:

awk '{ for( i=1; i<=NF; i++){ cnt[ tolower( $i ) ]++; if (cnt[$i] > 1) {print $i} } }' yourfile | sort -u

It prints these words for your example:

  • first
  • second
  • sentence
  • the

If you need a case sensitive counting, just delete tolower .

For each line in the file, the script iterates through each word (the for( i=1 i <= NF; i++) loop):

  • increments for each word a counter ( cnt[ tolower( $i) ]++ )
  • if the count is larger than 1 the word is printer
  • the pipe to sort -u sorts the output and removes the duplicates from the output.