dam4l10 dam4l10 - 7 months ago 19
Bash Question

Create column that is equal to the value of an existing column in same file +1?

I have a tab delimited text file with several million rows and with 2 columns that looks like this:

1 693731
1 729679
1 730087
1 731718
1 734349


I want to add an additional column to the file that is equal to the value of column 2 + 1. So for the above example it would look like this:

1 693731 693732
1 729679 729680
1 730087 730088
1 731718 731719
1 734349 734350


What would be the best way to do this using unix shell? Any input would be greatly appreciated.

Answer

When you're dealing with columnar data, "awk" is a good tool. It also has math capabilities built-in, so it's a natural for this:

awk '{ print $1"\t"$2"\t"($2+1); }' < data.tsv

awk runs this code for every line in the input. For each of those lines, the $ notation indicates a column: $1 is the first column in the current row, $2 is the second, and so on.

Though I prefer explicit column enumeration, you may use ghoti's optimization, where $0 represents all data on the current row:

awk '{ print $0"\t"($2+1); }' < data.tsv

Because of the UNIX toolbox approach, there are many ways to solve this problem. Whether or not this is "best" depends on many factors: speed, maintainability, portability, etc.