Tsvetan Krastanov Tsvetan Krastanov - 1 year ago 68
Bash Question

sort file by number after specific word from bash

I have a large file (more than 1000 rows) and I need to sort it by some criteria.
File contains rows like :

bla bla bla took 536ms. {"uniqueId":"ygfwyagf","duration":536} []
bla took 531ms. {"uniqueId":"wdagweg","duration":531} []
[2017-07-26 11:34:04.346533] wgwqegwqeg took 47ms. {qwgwqgce":"local","duration":47} []
[2017-07-2 [bla] Aocal took 41ms. {"uniagwrqgwqrwqg ation":41} []
[2017-07-26 1wergwg local took 39ms. {"uniqueId"wetgwgweqg gg}

Need to sort them by number after word "took"
with awk I can sort them via:
awk '{for(i=1;i<=NF;i++) if ($i=="took") print $(i+1)}' test | sort -h

but for the output, I need from an all rows, just to be sorted without losing anything. Unfortunately, the mss are not on the same column (will be easy).

A solution that needs to call out to another interpreter (perl, python, etc) will be accepted if preferable to (faster/simpler/more correct than) a native bash solution.

Answer Source

The easy way to do this is to extract the data you want to search on into a column, sort on it, and then remove that column in another pipeline element.

Thus, as an immediate step:

gawk '/.*took [[:digit:]]+ms.*/ { orig=$0; printf("%s\t%s\n", gensub(/.*took ([[:digit:]]+)ms.*/, "\\1", "g", $0), $orig); }'

This will make your stream look like:

536 bla bla bla took 536ms. {"uniqueId":"ygfwyagf","duration":536} []
531 bla  took 531ms. {"uniqueId":"wdagweg","duration":531} []
47  [2017-07-26 11:34:04.346533] wgwqegwqeg took 47ms. {qwgwqgce":"local","duration":47} []
41  [2017-07-2 [bla] Aocal took 41ms. {"uniagwrqgwqrwqg ation":41} []
39  [2017-07-26 1wergwg  local took 39ms. {"uniqueId"wetgwgweqg gg}

...at which point you can pass it through sort -n to sort on the number at the beginning, and then to a pipeline element that strips that leading value:

gawk '/.*took [[:digit:]]+ms.*/ { orig=$0; printf("%s\t%s\n", gensub(/.*took ([[:digit:]]+)ms.*/, "\\1", "g", $0), $orig); }' \
 | sort -n | cut -d $'\t' -f 2-

...and we have our output:

[2017-07-26 1wergwg  local took 39ms. {"uniqueId"wetgwgweqg gg}
[2017-07-2 [bla] Aocal took 41ms. {"uniagwrqgwqrwqg ation":41} []
[2017-07-26 11:34:04.346533] wgwqegwqeg took 47ms. {qwgwqgce":"local","duration":47} []
bla  took 531ms. {"uniqueId":"wdagweg","duration":531} []
bla bla bla took 536ms. {"uniqueId":"ygfwyagf","duration":536} []
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download