TheGirrafish TheGirrafish - 6 months ago 16
Linux Question

Bash - Disable regex in awk statement

I have a text file like so:

tets v1.0
psutil==4.1.0
tclclean==2.4.3

test v2.0
psutil==3.1.1
pyYAML==3.11

not_test
psutil==4.1.0
tclclean==2.8.0


and i'm using awk and the user's input to find the text under the first line of a specific block. The command I use is (where user_in is the user's input)...

awk -v ORS='\n\n' -v RS= -v FS='\n' "\$1 ~ \"^$user_in$\"" myfile.txt


The problem is that if the user inputs ".*", the awk statement is going to take it as a regex and give me all three blocks, but I don't want anything to be outputed since it doesn't match any of the first lines literally.

What I'm trying to say is, is there a way to disable regex in awk and take every char in the literal way (in the same manner as fgrep)?

Answer

Read the book Effective Awk Programming, 4th Edition, by Arnold Robbins.

Now let's clean up your script:

awk -v ORS='\n\n' -v RS= -v FS='\n' "\$1 ~ \"^$user_in$\"" myfile.txt

Don't enclose any script for any tool in double quotes, always use single quotes so you don't end up in backslash-escaping hell. So the above becomes:

awk -v ORS='\n\n' -v RS= -v FS='\n' -v user_in="$user_in" '$1 ~ "^"user_in"$"' myfile.txt

And if you want to test for a string then just test for a string, not a regexp, e.g. to find records where $1 STARTS WITH your target string:

awk -v ORS='\n\n' -v RS= -v FS='\n' -v user_in="$user_in" 'index($1,user_in)==1' myfile.txt

or CONTAINS your target string:

awk -v ORS='\n\n' -v RS= -v FS='\n' -v user_in="$user_in" 'index($1,user_in)>=1' myfile.txt

or ENDS WITH your target string:

awk -v ORS='\n\n' -v RS= -v FS='\n' -v user_in="$user_in" 'index($1,user_in)==(length($1)-length(user_in))' myfile.txt

or if you want to find cases where $1 IS the target string instead of just starting with it (as your script was attempting), it's even simpler:

awk -v ORS='\n\n' -v RS= -v FS='\n' -v user_in="$user_in" '$1 == user_in' myfile.txt