J.Carter J.Carter - 1 month ago 8
Perl Question

Perl one-liner to delete files with few header lines

This is the next part of to my earlier question (40256438). Here I have many ".fa" files in a folder. Suppose for three files:
"1.fa" "2.fa" "3.fa"
the contents of them are as follows:

1.fa

>djhnk_9
abfgdddcfdafaf
ygdugidg
>kjvk.80
jdsfkdbfdkfadf
>jnck_q2
fdgsdfjghsjhsfddf
>7ytiu98
ihdlfwdfjdlfl]ol


2.fa

>cj76
dkjfhkdjcfhdjk
>67q32
nscvsdkvklsflplsad
>kbvbk
cbjfdikjbfadkjfbka


3.fa

>1290.5
mnzmnvjbsdjb


The line that starts with a ">" are the 'headers' and the others are the 'features' lines.
Now I want to delete those files that have 3 or less than 3 number of header lines.
Here, file 2.fa and file 3.fa will get deleted.
As I am working in a windows system, preferably I use a one line perl script like:

for %%F in ("*.fa") do perl ...


Any one-liner for that? Thanks

Answer

I would suggest doing this with perl like this :

perl -nE "$h{$ARGV}++ if /^>/ }{ unlink grep { $h{$_} <= 3 } keys %h" *.fa

(for the record, I'm using double-quotes" as the delimiter of the string since you are on windows, but if anyone wish to use this on an unix system, just change the double-quotes " for some single-quotes').

Explanations:

  • -n surround the code with while(<>){...}, which will read the files one by one.
  • With $h{$ARGV}++ if /^>/ we count the number of headers in each file : $ARGV holds the name of the file being read, and /^>/ is true only if the line starts with >, ie. it's a header line.
  • Finally ( }{ is here roughly equivalent to END {), we delete (with the function unlink) the files that have 3 headers or less : keys %h gives all the file names, and grep { $h{$_} <= 3 } retains only the files that have 3 or less header lines to delete them.
Comments