J.Carter J.Carter - 1 month ago 9
Perl Question

One-line program to delete files with few header lines

This is the next part of my earlier question perl one-liner to keep only desired lines. Here I have many

*.fa
files in a folder.

Suppose for three files:
1.fa
,
2.fa
,
3.fa


The contents of them are as follows:

1.fa



>djhnk_9
abfgdddcfdafaf
ygdugidg
>kjvk.80
jdsfkdbfdkfadf
>jnck_q2
fdgsdfjghsjhsfddf
>7ytiu98
ihdlfwdfjdlfl]ol


2.fa



>cj76
dkjfhkdjcfhdjk
>67q32
nscvsdkvklsflplsad
>kbvbk
cbjfdikjbfadkjfbka


3.fa



>1290.5
mnzmnvjbsdjb


The lines that start with a
>
are the headers and the rest are the feature lines.

I want to delete those files that have 3 or fewer header lines. Here, file
2.fa
and file
3.fa
should be deleted.

As I am working on a Windows system, preferably I use a one-line Perl script like:

for %%F in ("*.fa") do perl ...


Is there a one-line program for that?

Answer

I would suggest doing this with perl like this :

perl -nE "$count{$ARGV}++ if /^>/; END { unlink grep { $count{$_} <= 3 } keys %count }" *.fa

(for the record, I'm using double-quotes" as the delimiter of the string since you are on windows, but if anyone wish to use this on an unix system, just change the double-quotes " for some single-quotes').

Explanations:

  • -n surround the code with while(<>){...}, which will read the files one by one.
  • With $h{$ARGV}++ if /^>/ we count the number of headers in each file : $ARGV holds the name of the file being read, and /^>/ is true only if the line starts with >, ie. it's a header line.
  • Finally ( the END { .. } part), we delete (with the function unlink) the files that have 3 headers or less : keys %h gives all the file names, and grep { $h{$_} <= 3 } retains only the files that have 3 or less header lines to delete them.
Comments