HoHoHo HoHoHo - 6 months ago 9
Bash Question

Possible modification of nested for loop

I am a newbie, and am trying to modify the code below so that it takes less time to run. (Right now it takes ages.) Please help or give any suggestions, if possible. Thank you beforehand.

#!/bin/sh
for pheno in `cat /wrk/abc/composition/results/list.txt`; do
header=`head -1 /wrk/abc/composition/results/"$pheno"/meta_"$pheno".out`
echo "pheno $header" > results.txt
for pheno in `cat /wrk/abc/composition/results/list.txt`; do
awk -v p="$pheno" \
'NR == FNR{a[$1]; next}($3) in a{print p, $0}' \
list.txt \
/wrk/abc/composition/results/"$pheno"/meta_"$pheno".out \
>> results.txt
done
done

agc agc
Answer

Assuming list.txt is line separated, here's the same code simplified, with no useless cats, (the for loops where swapped for while reads), and using cd to reduce unreadable long paths, followed by some notes. It should be only a little faster, and work the same as before, such as it was:

cd /wrk/abc/composition/results/
while read pheno ; do
    { echo -n pheno; head -1 "$pheno"/meta_"$pheno".out ; } \
       > results.txt
    while read pheno ; do
        awk -v p="$pheno" \
            'NR == FNR{a[$1]; next}($3) in a{print p, $0}' \
            list.txt \
            "$pheno"/meta_"$pheno".out \ 
            >>  results.txt
    done < list.txt
done < list.txt
cd -

The most glaring error is that there are two loops, one nested in the other; both use the same variable name ($pheno), both input the same file (list.txt) -- surprisingly, that sort of code may function correctly, despite being confusing. But this must cause the slowdown, since the inner loop runs awk on that same input file. So if there were 100 lines in list.txt, that file might be read 1,000,000 times.

Then there's results.txt, which the inner loop appends data to, and the outer loop overwrites every cycle. results.txt therefore winds up being filled with only the data from the very last cycle.