MaMazav MaMazav - 5 months ago 12
Perl Question

grep filtering with both pattern and input file

I have an input file which looks like:

$Interesting line
$Interesting line 2
#Also interesting line
Non interesting line - filter out
$another interesting line
Interesting line contains FiRsT pattern
Another non interesting line
Interesting line contains sec"o^nd pattern
#Interesting line


I have another pattern file, which contains patterns I would like to filter (notice that the pattern file may contain problematic chars - I would like to refer them as simple chars and not wildcard / regex):

FiRsT
sec"o^nd


I would like to have the following result:

$Interesting line
$Interesting line 2
#Also interesting line
$another interesting line
Interesting line contains FiRsT pattern
Interesting line contains sec"o^nd pattern
#Interesting line


That is, the following two lines were filtered out:

Non interesting line - filter out
Another non interesting line


More precisely, I would like to have in the result file all lines contain any string of the pattern file OR lines starts with # or $ (order is important).

I know how to filter the strings from the pattern file:

grep -F -f pattern_file.txt input_file.txt


and I know how to filter all lines start with $ and #:

grep '^\$\|^#' input_file.txt


But how should I do both? Is the only way is to write a short sub-script for that, or may I still use simple grep/sed/whatever standard linux commands?

Again, remember that:


  • Order of lines is important and must match the original input file order.

  • Pattern file may contain problematic chars, I would like to refer them as regular chars (and not wildcards / regex).



Edit: Consider the following case:

Input file contains also

Interesting line with ^third pattern


Pattern file contains

^third


Of course, I would like that line to be in the result file. That's why I cannot refer the pattern file without -F flag, and cannot just add ^\$ and ^# lines to it.

Answer

You could do it with awk:

NR==FNR { pattern[NR]= $0; count++; next }
/^[$#]/ { print ; next }
{
    for (i = 1; i <= count; i++) {
        if (index($0, pattern[i]) > 0) {
            print; next;
        }
    }
}

Alternatively, you could process your pattern file and quote all regex meta-characters.