sadiq.ali sadiq.ali - 2 months ago 6
Bash Question

Print first occurrence of each unique regex match with line number

Given a regex, I want to print first occurrence of each unique match with its line number using bash.

For example, lets say the regex is

.*Exception
, I want to print,

$./script.sh file.log
6255:2016-09-07 10:05:37,886 ERROR some text java.lang.IllegalMonitorStateException
6714:2016-09-07 10:12:09,514 ERROR some text java.lang.NullPointerException
7013:2016-09-07 10:19:19,950 ERROR some text java.lang.IllegalStateException


I came up with a version, but it is very slow :( (on git-bash). Any pointers on how to increase performance is appreciated.

FILE_NAME=$1

while read line
do
grep "$line" "$FILE_NAME" -m1 -n
done < <(grep '\b[^ ]*Exception\b' "$FILE_NAME" | sort -u) | sort -n


Update (adding sample data):

2016-09-07 23:58:55,674 ERROR [STDERR] (pool-18-thread-1) Continuing ...
2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.InstantiationException: java.sql.Timestamp
2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) Continuing ...
2016-09-07 23:56:16,273 WARN [com.arjuna.ats.jta.logging.loggerI18N] (Thread-12) [com.arjuna.ats.internal.jta.recovery.xarecovery1] Local XARecoveryModule.xaRecovery got XA exception javax.transaction.xa.XAException, XAException.XAER_RMERR
2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.RuntimeException: failed to evaluate: <unbound>=Class.new();
2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) Continuing ...
2016-09-07 23:58:26,304 WARN [com.arjuna.ats.jta.logging.loggerI18N] (Thread-12) [com.arjuna.ats.internal.jta.recovery.xarecovery1] Local XARecoveryModule.xaRecovery got XA exception javax.transaction.xa.XAException, XAException.XAER_RMERR


Above should produce:

2:2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.InstantiationException: java.sql.Timestamp
4:2016-09-07 23:56:16,273 WARN [com.arjuna.ats.jta.logging.loggerI18N] (Thread-12) [com.arjuna.ats.internal.jta.recovery.xarecovery1] Local XARecoveryModule.xaRecovery got XA exception javax.transaction.xa.XAException, XAException.XAER_RMERR
5:2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.RuntimeException: failed to evaluate: <unbound>=Class.new();

Answer
$ cat ip.txt 
2016-09-07 23:58:55,674 ERROR [STDERR] (pool-18-thread-1) Continuing ...
2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.InstantiationException: java.sql.Timestamp
2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) Continuing ...
2016-09-07 23:56:16,273 WARN  [com.arjuna.ats.jta.logging.loggerI18N] (Thread-12) [com.arjuna.ats.internal.jta.recovery.xarecovery1] Local XARecoveryModule.xaRecovery  got XA exception javax.transaction.xa.XAException, XAException.XAER_RMERR
2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.RuntimeException: failed to evaluate: <unbound>=Class.new();
2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) Continuing ...
2016-09-07 23:58:26,304 WARN  [com.arjuna.ats.jta.logging.loggerI18N] (Thread-12) [com.arjuna.ats.internal.jta.recovery.xarecovery1] Local XARecoveryModule.xaRecovery  got XA exception javax.transaction.xa.XAException, XAException.XAER_RMERR

$ perl -ne '($e)=/(\w+Exception)/; print "$.:$_" if !$seen{$e}++ && /Exception/' ip.txt
2:2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.InstantiationException: java.sql.Timestamp
4:2016-09-07 23:56:16,273 WARN  [com.arjuna.ats.jta.logging.loggerI18N] (Thread-12) [com.arjuna.ats.internal.jta.recovery.xarecovery1] Local XARecoveryModule.xaRecovery  got XA exception javax.transaction.xa.XAException, XAException.XAER_RMERR
5:2016-09-07 23:58:55,675 ERROR [STDERR] (pool-18-thread-1) java.lang.RuntimeException: failed to evaluate: <unbound>=Class.new();
  • ($e)=/(\w+Exception)/ saves the type of exception in $e variable
  • !$seen{$e}++ makes sure only first line matching the exception is printed
  • && /Exception/ to print only lines containing Exception
  • print "$.:$_" print line number, : and the input line


Edit:

This should work too and faster...

perl -ne 'if(/(\w+Exception)/){print "$.:$_" if !$seen{$1}++}' ip.txt