jnmf jnmf - 2 years ago 80
Perl Question

why I can only write the last line in output?

I have this code but in my output file it's only writing the last result. What did I miss?


use strict;
use warnings;

print "type in the name of a file\n";
my $fasta_file = <STDIN>;
chomp ($fasta_file);

open (FASTA, $fasta_file) or die "error $!";

my $fasta = "";

while (my $line = <FASTA>) {
$fasta .= $line; }

my @seq = split (/\>/s, $fasta);

for (my $i=0; $i<@seq; $i++) {
my $sequence = $seq[$i];
next if $sequence eq '';

my ($header, $dna) = split (/\n/, $sequence, 2);

$dna =~ s/\s//g;
my $size = length $dna;

print "$header\n";

my $A = 0;
my $C= 0;
my $T = 0;
my $G= 0;
my $U = 0;
my $error = 0;

for (my $b=0; $b<$size; $b++){
my $base = substr ($dna, $b, 1);

if ($base eq 'A'){ $A++; }
elsif ($base eq 'C'){ $C++; }
elsif ($base eq 'T'){ $T++; }
elsif ($base eq 'G'){ $G++; }
elsif ($base eq 'U'){ $U++; }
else { print " $base not recognized\n"; };

my $perc_GC = ((($G + $C) / $size) *100);
print "$perc_GC% GC\n";

$fasta_file =~ m/(\S+).fasta/;
$fasta_file =~ s/.fasta/.gc/g;

open(OUTFILE, "> $fasta_file") or die "\nCould not create file ;/";

print OUTFILE "$perc_GC% GC\n";


close (FASTA);


>cel-mir-39 MI0000010 Caenorhabditis


cel-mir-39 MI0000010 Caenorhabditis 50% GC

Also, it's not writing the header on my output file.

Answer Source

The problem is that you are opening the output file within your inner for loop, which is the one that iterates over the bases. That deletes anything that has already been written and opens a fresh empty file each time, which isn't what you want. You need to open your output file once only, before you start processing the input data

Your header isn't printed to the file because you only print it to STDOUT with print "$header\n". The repeated opening of your output file would also erase this even if it was working properly

You also need to move the base calculations outside the inner loop. All that loop should do is scan through the bases and count the Gs and Cs. Nothing should be output until the total has been calculated

You may prefer this rewrite of your program, which is much more concise and better follows best practices. It also fixes the issue with printing the sequence header to the output


use strict;
use warnings qw/ all FATAL /;

print 'Enter the name of the FASTA file: ';
my $fasta_file = <STDIN>;
chomp $fasta_file;
print "\n";

(my $out_file = $fasta_file) =~ s/\.fasta$/.gc/ or die "File name must end in .fasta";
open my $out_fh, '>', $out_file or die qq{Unable to open "$out_file" for output: $!};

my $fasta = do {
    open my $fh, '<', $fasta_file or die qq{Unable to open "$fasta_file" for input: $!};
    local $/;

for my $sequence ( split />/, $fasta ) {

    next unless $sequence =~ /\S/;

    my ( $header, $dna ) = split /\n/, $sequence,  2;
    $dna =~ s/\s//g;

    my %bases;
    ++$bases{$_} for split //, $dna;

    for my $fh ( \*STDOUT, $out_fh ) {
        printf $fh "%s %.1f%% GC\n",
            ( $bases{G} + $bases{C} ) / length($dna) * 100;


Enter the name of the FASTA file: test.fasta

cel-mir-39 MI0000010 Caenorhabditis 50.0% GC
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download