Fshamri Fshamri - 5 months ago 14
Bash Question

Search a word and count its occurrence in a file

I want to search for 3 words and count their occurrences in tens of files. those files names contains prefix + time stamp like

FTM.FC102.20160623183001.20160623184500

i want to search on them the following words:
Date
OK
RETRY
DROP
then get their counts into new file. the desired output should be like:

filename OK RETRY DROP
=================================
XXX20160622XXX 221 305 400 //those values are the count of words
....... ... ... ...


I have tries the following:

fileName=$(date --date="-1 day" +"%Y%m%d")
cd /advdata/ticketdatashareA/FTM_Sms/
format=*`echo $fileName`*
for i in $format;
do
if [[ "$i" == "$format" ]]
then
echo "No Files"
else
echo -n "file name $i :" | cut -c21-49 ; echo '\t' `grep OK $i | wc -l`; echo '\t' `grep "RETRY" $i | wc -l`; echo '\t' `grep "DROP" $i | wc -l`;
fi
done


what i got is:

20160623134501.20160623140000
\t 107
\t 0
\t 0

Answer

This is a solution for Bash:

declare -a words=( OK RETRY DROP )

for file in FTM.FC102.*; do
    printf "$file "
    for word in "${words[@]}"; do
        grep -o "$word" "$file" | wc -l | tr '\n' ' '
    done
    echo
done | rs 0 $(( ${#words[@]} + 1 )) # alternatively:  | tr -s ' ' '\t'

Explanation:

  • We store the words that we'll look for in the array words.
  • Loop through the files (change the pattern to match your needs).
  • For each file, we construct a line starting with the filename, then...
  • For each word, grep -o on the file to get all matches for it.
  • Count the number of matches (removing newlines from the end of the output of wc with tr).
  • At the end of the line, emit a newline with a bare echo to end the line of output for this file.
  • Pipe everything to rs to format the columns nicely. This utility is available on at least BSD system... If you don't have it, just remove the pipe and live with wonky columns, or use | tr -s ' ' '\t' instead, which does a half-decent job.

Does not print the header though.

With two files with the following contents:

$ cat text1
Neque porro quisquam est qui dolorem ipsum quia dolor sit amet,
consectetur, adipisci velit...

$ cat text2
There is no one who loves pain itself, who seeks after it and wants to
have it, simply because it is pain...

... and with the "words" a, b and c, the script does this:

$ bash script.sh
text1  4      0      3
text2  7      1      1
Comments