lacobus lacobus - 7 months ago 11
Bash Question

bash routine to return the page number of a given line number from text file

Consider a plain text file containing page-breaking ASCII control character "Form Feed" ($'\f'):

01. alpha\n
02. beta\n
03. gamma\n\f
04. one\n
05. two\n
06. three\n
07. four\n
08. five\n\f
09. earth\n
10. wind\n
11. fire\n
12. water\n\f


Note that each page has a random number of lines.

Need a bash routine that return the page number of a given line number from a text file containing page-breaking ASCII control character.

After a long time researching the solution I finally came across this piece of code:

function get_page_from_line
{
local nline="$1"
local input_file="$2"

local npag=0
local ln=0
local total=0

while IFS= read -d $'\f' -r page; do

npag=$(( ++npag ))

ln=$(echo -n "$page" | wc -l)

total=$(( total + ln ))

if [ $total -ge $nline ]; then
echo "${npag}"
return
fi

done < "$input_file"

echo "0"

return
}


But, unfortunately, this solution proved to be very slow in some cases.

Any better solution ?

Thanks!

Answer

The idea to use read -d $'\f' and then to count the lines is good.

To speed up the function use as less as commands as possible: try to replace them with shell built-in ones.

Give this tested version a try. This is your version where the count of lines has been replaced with a for loop using only one addition, one assignment and one test.

The tests showed that this is much more faster (almost x20 approx.) compare to the original version:

function get_page_from_line ()
{
    local nline="$1"
    local input_file="$2"

    local npag=0
    local total=0

    while IFS= read -d $'\f' -r page; do
        npag=$(( npag + 1 ))
        IFS=$'\n'
        for line in ${page}
        do
            total=$(( total + 1 ))
            if [ ${total} -eq ${nline} ] ; then
                printf "%d\n" ${npag}
                unset IFS
                return
            fi
        done
        unset IFS
    done < "$input_file"
    printf "0\n"
    return
}

awk is far ahead in performance compare to the pure bash version above. awk was created for such text processing.

Give this tested version a try:

function get_page_from_line ()
{
  awk -v nline="${1}" '
    BEGIN {
      npag=1;
    }
    {
      if (index($0,"\f")>0) {
        npag++;
      }
      if (NR==nline) {
        print npag;
        linefound=1;
        exit;
      }
    }
    END {
      if (!linefound) {
        print 0;
      }
    }' "${2}"
}

When \f is encountered, the page number is increased.

NR is the current line number.

Comments