AB. AB. - 16 days ago 5
Bash Question

Retrieve last and next pattern after finding a pattern

I've spent the last 2-3 days googling and searching a solution but I can't seem to find any.

Basically, I have a text file containing hundreds of thousands of records. Here's a pattern of what's contained in the file.


  • Line 01: ^D 23554

  • Line 02: Q 123 325

  • Line 03: Y qwe325

  • Line 04: ^P fiwkkwlds

  • Line 05: Y qrwe

  • Line 06: Y rtewt

  • Line 07: ^A 284274 DFL 2939955 001

  • Line 08: F 2739

  • Line 09: ^D 23556

  • Line 10: ^k 2994

  • Line 11: ^A 284274 DFL 2939966 002

  • Line 12: ^k 29942

  • Line 13: ^k 32423

  • Line 14: ^A 284274 DFL 2939957 003

  • Line 15: F 23425

  • Line 16: ^A 284274 DFL 2939958 004

  • Line 17: F 92823

  • Line 18: and so on...



Basically, there isn't a specific pattern in the data however every start of the line --> ^D, Q, Y, ^P, ^A, F, ^k represents a simple message.

I'm looking to create a script (preferably in shell, perl or c++) that will scan a file from the first line until the last line and

1) retrieve all the values in the ^A line
2) insert a delimiter
3) retrieve the last value in the ^D line
4) insert a delimiter
5) retrieve the next value in the F line
6) hopefully, create another text file with the data

Based on my previous example, below is the results:


  • Line 01: 284274 DFL 2939955 001|23554|2739

  • Line 02: 284274 DFL 2939966 002|23556|23425

  • Line 03: 284274 DFL 2939957 003|23556|23425

  • Line 04: 284274 DFL 2939958 004|23556|92823



In other words:

value from ^A line | value from previous ^D line | value from next F line.

Is there someone that could help me out? I've been reading about hashmaps and hashtables but I'm not too sure how to use them. I've seen a lot of solution using grep where you find a pattern ex: ^A and print the last x lines before/after that pattern however, as this data can be super random, the previous ^D message or next F message could be on any line.

The solution would kind of have to read the file and always keep in memory the ^D and F line value and retrieve them when pattern ^A is found.

Can someone help me out :)

Thank you!!!!

Answer

This one works, but i assume should be slow for big files:

IFS=$'\n'
readarray -t -O1 data< <(grep -h -e "\^D" -e "\^A" -e "^F" a.txt)
posA=1
for i in "${data[@]}"; do
if [[ "$i" = "^A"* ]]; then
    textA="${data[$posA]}"
    posD=$posA
    posF=$posA
    textD=""
    textF=""
    while [ "$posD" -ge 1 ] && [[ "$textD" != "^D"* ]]; do
    posD=$(($posD - 1))
    textD="${data[$posD]}"
    done

    while [ "$posF" -le "${#data[@]}" ] && [[ "$textF" != "F"* ]]; do
    posF=$(($posF + 1))
    textF="${data[$posF]}"
    done
    textADF="$textA | $textD | $textF"
    echo "ADF=$textADF"
fi
posA=$(($posA + 1))
done
unset IFS
exit

The whole implementations is based on the output of

grep -h -e "\^D" -e "\^A" -e "^F" a.txt

which is stored to an array "data" and then code manipulates this array, and also in the fact that while loop will stop in the first match.

Maybe you could combine above grep with head and tail to avoid array manipulation.

PS1: Applying also -n switch to grep provides an interesting output.

PS2: I was not capable to directly grep your file with groups of "^A ^D F" to avoid array manipulation with code, but may be this is possible with regex.

Comments