KillBill KillBill - 3 months ago 13
Bash Question

count the number of words between two lines in a text file

As the title says I'm wondering if there is an easier way of getting the number of words between two lines in a text file, using text processing tools available on *nix.

For example given a text file is as follows,

a bc ae
a b
ae we wke wew


countwords between, 1-2 -> 5, 2-3 -> 6.

Answer

You can do this with a simple awk command:-

awk -v start='1' -v end='2' 'NR>=start && NR <=end{sum+=NF}END{print sum}' file

For the sample file you have provided:-

$ cat file
a bc ae
a b
ae we wke wew

$ awk -v start='1' -v end='2' 'NR>=start && NR <=end{sum+=NF}END{print sum}' file
5

$ awk -v start='2' -v end='3' 'NR>=start && NR <=end{sum+=NF}END{print sum}' file
6

$ awk -v start='1' -v end='3' 'NR>=start && NR <=end{sum+=NF}END{print sum}' file
9

The logic is simple:-

  1. Use the start, end variables for specifying the ranges in the file, they are awk variables
  2. NR>=start && NR <=end provides the condition to loop from the lines you need
  3. sum+=NF does the word count arithmetic. NF is a special awk variable which counts the number of words de-limited by IFS, which in this case is white-space.
  4. END{print sum} prints the final count.

Worked fine on GNU Awk 3.1.7

Comments