Tom Tom - 1 year ago 90
Perl Question

Write a Perl script that takes in a fasta and reverses all the sequences (without BioPerl)?

I dont know if this is just a quirk with Stawberry Perl, but I can't seem to get it to run. I just need to take a fasta and reverse every sequence in it.

-The problem-

I have a multifasta file:


and the expected output is:


The script is here:

$NUM_COL = 80; ## set the column width of output file
$infile = shift; ## grab input sequence file name from command line
$outfile = "test1.txt"; ## name output file, prepend with “REV”
open (my $IN, $infile);
open (my $OUT, '>', $outfile);
$/ = undef; ## allow entire input sequence file to be read into memory
my $text = <$IN>; ## read input sequence file into memory
print $text; ## output sequence file into new decoy sequence file
my @proteins = split (/>/, $text); ## put all input sequences into an array

for my $protein (@proteins) { ## evaluate each input sequence individually
$protein =~ s/(^.*)\n//m; ## match and remove the first descriptive line of
## the FATA-formatted protein
my $name = $1; ## remember the name of the input sequence
print $OUT ">REV$name\n"; ## prepend with #REV#; a # will help make the
## protein stand out in a list
$protein =~ s/\n//gm; ## remove newline characters from sequence
$protein = reverse($protein); ## reverse the sequence

while (length ($protein) > $NUM_C0L) { ## loop to print sequence with set number of cols

$protein =~ s/(.{$NUM_C0L})//;
my $line = $1;
print $OUT "$line\n";
print $OUT "$protein\n"; ## print last portion of reversed protein

close ($IN);
close ($OUT);
print "done\n";

Answer Source

This will do as you ask

It builds a hash %fasta out of the FASTA file, keeping array @keys to keep the sequences in order, and then prints out each element of the hash

Each line of the sequence is reversed using reverse before it is added to the hash, and using unshift adds the lines of the sequence in reverse order

The program expects the input file as a parameter on the command line, and prints the result to STDOUT, which may be redirected on the command line

use strict;
use warnings 'all';

my (%fasta, @keys);

    my $key;

    while ( <> ) {


        if ( s/^>\K/REV/ ) {
            $key = $_;
            push @keys, $key;
        elsif ( $key ) {
            unshift @{ $fasta{$key} }, scalar reverse;

for my $key ( @keys ) {
    print $key, "\n";
    print "$_\n" for @{ $fasta{$key} };




If you prefer to rewrap the sequence so that short lines are at the end, then you just need to rewrite the code that dumps the hash

This alternative uses the length of the longest line in the original file as the limit, and rerwraps the reversed sequence to the same length. It's claer that it would be simple to specify an explicit length instead of calculating it

You will need to add use List::Util 'max' at the top of the program

my $len = max map length, map @$_, values %fasta;

for my $key ( @keys ) {
    print $key, "\n";
    my $seq = join '', @{ $fasta{$key} };
    print "$_\n" for $seq =~ /.{1,$len}/g;

Given the original data the output is identical to that of the solution above. I used this as input


with this result. All lines have been wrapped to eleven characters - the length of the longest JKLMNOPQRST line in the original data

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download