Fshamri Fshamri - 5 months ago 30
Bash Question

Unix shell script to search for error codes in thousand files then print the count in text file

I need to find both 150+ eventType and errorCodes in 1700 files each day. That means i have to loop over 1700 files to find the occurrence count of 150+ eventType/errorCode and put those counts in a text file as a daily report.

I have placed those eventType/errorCode values in a text file separated by commas:

10008,4569
10008,4568
10003,1200
40000,4006


My initial code:

#!/bin/bash
DT=`date +%Y%m%d%H` //Today's date
fileName=$(date --date="-1 day" +"%Y%m%d") //file name associated with yesterday's date
Yesterday=$(date --date="-1 day" +"%Y-%m-%d") //Yesterday's date
cd /advdata/datashareB/FFFF/continuousDownstream/` echo $Yesterday`

### Here I want to loop through text file that contains both errorCodes/eventsType and search them in 1700 files. in the loop i have to execute the following command:
### eventExport -printEvents -file Run_`echo $fileName`*_*.tar -filter "ErrorCode=4569;EventType=10008" -names -silent | wc -l


The output should be written to a text file in the following format:

Date 10008/4569 10008/4568 10003,1200 ... ...
20160621 100 12800 58
........ .... ..... ... .... ... ...


where the first row is the header and the second row is the total count of errorCodes/eventsType.

Every day the script should insert the values in the new line in the output file (text file).

How can I write this loop?

EDIT:
The file format is tar file like
Run_20160622_105700_02of04.tar

. eventExport reads those tar files and extract error codes & eventTypes as given in the eventExport argument. the command is like:

eventExport -printEvents -file Run_20160526_09*_*.tar -filter "**ErrorCode=4569;EventType=10008**" -names -silent | head | awk -F, '{OFS =","; print $3, $8,$9, $14}'


The output of is:

AccessKey="706385970",EventType=10008,OrigEventTime=2016-06-21 23:29:42.000,ErrorCode=4569


Here, eventsType is associated with errorCode. I have more than 150 eventTypes which i want to find them and get their counts in the tar files. tar files are more than 1700 file generated per day.

Answer

Here is a GNU awk script (as its own script file, for reusability) that parses the event types and error codes the log file and reports the counts of matching event types and error codes for each date.

#!/usr/bin/awk -f

/^[0-9]+,[0-9]+$/ {
    # this line contains event type and error code

    split($0, data, ",");
    keys[data[1]][data[2]] = 0;
}

match($0, "EventType=([0-9]+).*ErrorCode=([0-9]+)", key) {
    # this line is from the log file

    if (key[1] in keys && key[2] in keys[key[1]]) {
        match($0, "OrigEventTime=([0-9-]+)", date);
        datecount[date[1]][key[1]][key[2]]++;
    }
}

END {
    for (d in datecount) {
        for (k1 in datecount[d]) {
            for (k2 in datecount[d][k1]) {
                printf("%s\t%s/%s\t%d\n",
                        d, k1, k2, datecount[d][k1][k2]);
            }
        }
    }
}

Running it (note thot this requires GNU awk):

$ awk -f script.awk codes.txt run.log

The output is not quite in the format that you wanted, but I'm hoping it's close enough:

2016-06-11  10008/4569  1
2016-06-21  10008/4569  4
2016-06-21  40000/4006  1

(I duplicated the data that you gave us a few times and change a date and one of the event types and error codes).

UPDATE: I reworked the script for GNU awk versions older than 4.0 (that do not understand arrays of arrays):

#!/usr/bin/awk -f

/^[0-9]+,[0-9]+$/ {
    # this line contains event type and error code

    split($0, data, ",");
    keys[data[1],data[2]] = 1;
}

match($0, "EventType=([0-9]+).*ErrorCode=([0-9]+)", key) {
    # this line is from the log file

    if (keys[key[1],key[2]] == 1) {
        match($0, "OrigEventTime=([0-9-]+)", date);
        count[date[1],key[1],key[2]]++;
    }
}

END {
    for (comb in count) {
        split(comb, field, SUBSEP);
        printf("%s\t%s/%s\t%s\n", field[1], field[2], field[3], count[comb]);
    }
}