J.Carter J.Carter - 11 months ago 51
Perl Question

Perl one-liner to delete files with few header lines

This is the next part of to my earlier question (40256438). Here I have many ".fa" files in a folder. Suppose for three files:
"1.fa" "2.fa" "3.fa"
the contents of them are as follows:







The line that starts with a ">" are the 'headers' and the others are the 'features' lines.
Now I want to delete those files that have 3 or less than 3 number of header lines.
Here, file 2.fa and file 3.fa will get deleted.
As I am working in a windows system, preferably I use a one line perl script like:

for %%F in ("*.fa") do perl ...

Any one-liner for that? Thanks

Answer Source

I would suggest doing this with perl like this :

perl -nE "$h{$ARGV}++ if /^>/ }{ unlink grep { $h{$_} <= 3 } keys %h" *.fa

(for the record, I'm using double-quotes" as the delimiter of the string since you are on windows, but if anyone wish to use this on an unix system, just change the double-quotes " for some single-quotes').


  • -n surround the code with while(<>){...}, which will read the files one by one.
  • With $h{$ARGV}++ if /^>/ we count the number of headers in each file : $ARGV holds the name of the file being read, and /^>/ is true only if the line starts with >, ie. it's a header line.
  • Finally ( }{ is here roughly equivalent to END {), we delete (with the function unlink) the files that have 3 headers or less : keys %h gives all the file names, and grep { $h{$_} <= 3 } retains only the files that have 3 or less header lines to delete them.