Avi Avi - 7 months ago 14
Bash Question

Differneces between two .dat files using unix scripts

I need a UNIX script for the following requirements.

Input files:



1) Script should have input fields as

2) The script should compare both the files and give list of lines in two output files

To be added.txt
– this should have the list of lines which are available in file1_today but not in file2_prevday.

To be removed.txt
– this should have the list lines which are available in file2_prevday but not in file1_today.


I'll show you how to build at least half of what you need using a few simple commands (teach a man to fish, and all that...)

You could do this with a scripting language like perl or ruby. If you've ever wanted to learn one of those languages, then a program like this would be the perfect opportunity.

You can also do this by chaining commands together.

To start, the unix command 'diff' gives you the info you want, just not in the format you want. If you 'diff file2_prevday file1_today' then it will show lines that only exist in file1_today with '> ' at the front (your 'To_be_added.txt', and those only in file2_prevday' with '< ' at the front. I suggest trying that now with some sample files.

Now we can search for just those lines with grep which will search the input only for lines that match, for example:

% diff file2_prevday file1_today | grep '^> '

Here we search for lines that match the pattern '^> '. The '^' is a special character for grep (and comparable tools) that matches the beginning of a line.

Unfortunately this leaves the '> ' at the beginning of all our output.

We can modify the lines that go through the pipe with sed, which will let us do a search and replace. We search for the same pattern and match it with nothing:

% diff file2_prevday file1_today | grep '^> ' | sed -e 's/^> //'

This gives us our output for one of our files, which we can save:

% diff file2_prevday file1_today | grep '^> ' | sed -e 's/^> //' > To_be_added.txt

I'll leave the creation of the other file up to you.

Some questions you would probably benefit from answering for yourself:

  1. Why do we need the '^' in the grep and sed?
  2. How could I make a single alias that would run both commands?
  3. How could I write this as a script in a language such as perl/ruby/python?
  4. How could you generate the filenames using the date command and backquotes?