tanmay2507 tanmay2507 - 5 months ago 8
Bash Question

Read a log file to extract various fields and count the occurence

I have a logger file like this:

2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.commerce.common.utils.APIUtils - Enrichment data updated successful for partnumber : 13794017 with status : 201
2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.commerce.common.utils.APIUtils - Enrichment data updated successful for partnumber : 13794017 with status : 201
2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.commerce.common.utils.APIUtils - Enrichment data updated successful for partnumber : 13794017 with status : 201
2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute() 13794017 itemStatus itemsProcessed=1, itemsUpdated=1, timeTaken=1.808 sec
2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute() 13794017 itemStatus itemsProcessed=1, itemsUpdated=1, timeTaken=1.808 sec
2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute() 13794017 itemStatus itemsProcessed=1, itemsUpdated=1, timeTaken=1.808 sec
2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute() Scene7 update for 13794017 itemStatus itemsProcessed=1, itemsUpdated=1, timeTaken=
2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute() Scene7 update for 13794017 itemStatus itemsProcessed=1, itemsUpdated=1, timeTaken=
2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute() Scene7 update for 13794017 itemStatus itemsProcessed=1, itemsUpdated=1, timeTaken=
2016-06-11 07:34:01.543 98460 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute EXIT
2016-06-11 07:34:01.543 98460 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute EXIT
2016-06-11 07:34:01.543 98460 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute EXIT
2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.commerce.common.utils.APIUtils - Enrichment data updated successful for partnumber : 17696532 with status : 500
2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.commerce.common.utils.APIUtils - Enrichment data updated successful for partnumber : 17696532 with status : 500
2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.commerce.common.utils.APIUtils - Enrichment data updated successful for partnumber : 17696532 with status : 500
2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute() 17696532 itemStatus itemsProcessed=1, itemsUpdated=1, timeTaken=1.808 sec
2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute() 17696532 itemStatus itemsProcessed=1, itemsUpdated=1, timeTaken=1.808 sec
2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute() 17696532 itemStatus itemsProcessed=1, itemsUpdated=1, timeTaken=1.808 sec
2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute() Scene7 update for 17696532 itemStatus itemsProcessed=1, itemsUpdated=1, timeTaken=
2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute() Scene7 update for 17696532 itemStatus itemsProcessed=1, itemsUpdated=1, timeTaken=
2016-06-11 07:34:01.542 98459 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute() Scene7 update for 17696532 itemStatus itemsProcessed=1, itemsUpdated=1, timeTaken=
2016-06-11 07:34:01.543 98460 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute EXIT
2016-06-11 07:34:01.543 98460 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute EXIT
2016-06-11 07:34:01.543 98460 [Thread-23-Job-parser-bolt] INFO JobLoader c.t.c.w.b.JobParserBolt - execute EXIT


There is a repetition of hundreds of such logs with different partnumbers and status codes. I want to store the distinct partnumbers which have status codes other than 201 to a separate file so that we can monitor it easily. Though I would like to have a count of all 201 Success Posts. So, the sample output that I want from this should look like:

No. of partnumbers with Status 201: 1
Partnumbers with Status 500: 17696532, ... , ...
Partnumbers with Status 401: ... ,...


I used awk first, but then parsing isn't so easy. Also note that the same partnumber appears multiple times, how can I add a check so that I do not count a single partnumber more than once.

My Code till now:

awk -F'Enrichment data updated successful for partnumber :' '{print $2}' file.log |rev | cut -c 4- | rev


I wanted to extract the partnumber first like this, but I am not able to apply the check to avoid multiple partnumber problem and relate it's corresponding status code with it.

Answer

Here is the problem solved using awk. See the inline comments for the explanation.

awk '/Enrichment data updated successful for partnumber/ {
    # store the results as a multidimensional array with the first key
    # being the status and the key of the second array being the product
    # number. This removes duplicates because array keys must be unique
    arr[$NF][$16]++
}
END {
    # iterate over the 201 status items and count them
    for (item in arr[201]) {
        count++
    }
    print "No. of partnumbers with Status 201: " count

    # iterate over the status array
    for (status in arr) {
        # skip 201 status
        if (status == 201)
            continue
        # join the array by "," for printing
        # taken from http://stackoverflow.com/a/13648609/1032785
        joined = sep = ""
        for (product in arr[status]) {
            joined = joined sep product
            sep = ","
        }

        print "Partnumbers with Status " status ": " joined
    }
}
' foo.log

This produces the following output with your sample log file that I added some additional lines to:

No. of partnumbers with Status 201: 1
Partnumbers with Status 401: 17623039
Partnumbers with Status 500: 17696532, 17696539