TheFrenchGuy TheFrenchGuy - 14 days ago 7
Linux Question

gawk string extract between ([:alnum][:alnum][:alnum] and [:alnum])

I'm trying to get the string between

"([:alnum][:alnum][:alnum]" and ")"
and the string itself. The idea in fact is to clean a file that is polluted with many unwanted char.

For example I have many lines with something like :

bÖÓÄÉ@@@@ø16/11/2016 15H03'09" (ACTA/BN940-RYR71ND/A4067-LIPH-NILDU/1513F270-LEBL-9/B738/M-80/S-81/W/EQ Y/EQ) ø ZZZZtA$bÖÓÄÉ


And I want this kind of output :

(ACTA/BN940-RYR71ND/A4067-LIPH-NILDU/1513F270-LEBL-9/B738/M-80/S-81/W/EQ Y/EQ)


I tried with this gawk command but it doesn't work at all :

gawk 'NR > 1 {print $1}' RS='([[:alnum:]]*3' FS=')' $INPUT_FILE

Answer

This looks like a standard use of GNU grep:

grep -o '([[:alnum:]]\{3\}.*[[:alnum:]])' file

There were some problems with your regular expression syntax, which I've corrected.

The -o option prints only the matching part of the line.