Raziel Raziel - 1 year ago 54
Linux Question

Verifying if entries are found in ranges

I have two files one contains a list of individual entries (fileA) and another file containing a list of ranges (fileB).

I want to find out which entries in fileA are found in any ranges in fileB.

Sample entries in both files are

fileA

00100500000000
00100600000000
00100700000000
00100800000000
00100900000000
00101000000000
00101300000000
00101500000000
00101600000000
00101700000000
00101710000000
00101800000000
35014080000000
35014088000000
35067373000000


fileB

00100200000000,00100200999999
00100300000000,00100300999999
00100100000000,00100100999999
00100400000000,00100400999999
00100500000000,00100500999999
00100600000000,00100600999999
00100700000000,00100700999999
00100800000000,00100800999999
00100900000000,00100900999999
00101000000000,00101000999999
00101300000000,00101300999999
00101500000000,00101500999999
00101600000000,00101600999999
35048702000000,35048702999999
35048802000000,35048802999999
35077160000000,35077160999999
35077820000000,35077820999999
35085600000000,35085600999999


I used the below script but it takes about 6days to complete 140k entries in fileA and 50k of fileB. Is there a way to make it much faster?

list=`cat fileB`
for mobno in $list
do
LowVal="$(echo $mobno | cut -d, -f1)"
HighVal="$(echo $mobno | cut -d, -f2)"

while read ThisLine;
do [ ${ThisLine} -ge ${LowVal} ] && [ ${ThisLine} -le ${HighVal} ] && echo "${ThisLine}";done < fileA;
done;

Answer Source

You would have to test it for performance but the following awk script solution is an option:

NR == 1 && FNR == 1 { strt=1
        }
FNR == 1 && NR != 1 {
        strt=0
        }
strt==0 {
        pos=$0
        for (i in ranges) {
                split(i,arry,",")
                if ( pos >= arry[1] && pos <= arry[2]) {
                        print i" - "$0
                        }
                }
        }
strt==1 {ranges[$0]=""
        }

Run with:

 awk -f awkfile file B file A

Output:

00100500000000,00100500999999 - 00100500000000
00100600000000,00100600999999 - 00100600000000
00100700000000,00100700999999 - 00100700000000
00100800000000,00100800999999 - 00100800000000
00100900000000,00100900999999 - 00100900000000
00101000000000,00101000999999 - 00101000000000
00101300000000,00101300999999 - 00101300000000
00101500000000,00101500999999 - 00101500000000
00101600000000,00101600999999 - 00101600000000
00101700000000,00101700999999 - 00101700000000
00101710000000,00101710999999 - 00101710000000
00101800000000,00101800999999 - 00101800000000

We are essentially reading both files in using the variable strt to determine the end of one file and the start of the other. We read the ranges into an array (ranges) and then remove the leading zeros from both the ranges and each value in fileA to do the comparison.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download