C. Monster C. Monster - 5 months ago 16
Perl Question

Why are blank lines printing in my perl script print output



The details of what the script is doing isn't important, but I have put comments in what seem like the important lines to me, I'm only concerned with why I am getting blank lines in my output

When I run the command

./script.pl temp temp.txt tempF `wc -l temp | awk '{print $1}'`


The temp file contains

1 27800000 120700000 4
1 27800000 124300000 4
1 154800000 247249719 3
3 32100000 71800000 9
3 32100000 87200000 2
3 54400000 74200000 15
4 76500000 155100000 20
4 76500000 182600000 3
4 76500000 88200000 77
4 88200000 124000000 2
5 58900000 180857866 8
5 58900000 76400000 2
5 58900000 97300000 4
5 76400000 143100000 14
5 97300000 147200000 6
6 7000000 29900000 2
6 63500000 70000000 73
6 63500000 92100000 4
6 70000000 113900000 70
6 70000000 139100000 57
6 92100000 113900000 3


And I am getting output of the form

hs1 27800000 124300000 4


hs3 32100000 87200000 2
hs3 54400000 74200000 15

hs4 76500000 182600000 3
hs4 76500000 88200000 77
hs4 88200000 124000000 2

hs5 58900000 76400000 2
hs5 58900000 97300000 4
hs5 76400000 143100000 14
hs5 97300000 147200000 6


hs6 63500000 92100000 4

hs6 70000000 139100000 57
hs6 92100000 113900000 3


To standard output (about 8 of the lines are also printed to the temp.txt file but the formatting of those ones is correct)

This is the script below

#!/usr/bin/perl

# ARGV[0] is the name of the file which data will be read from(may have overlaps)
# ARGV[1] is the name of the file which will be produced that will have no overlaps
# ARGV[2] is the name of the folder which will hold all the circos data file (mitelmanAll, mitelmanProstate, etc.)
# ARGV[3] is the number of lines that ARGV[0] will contain

use warnings;

my $file = "./$ARGV[0]";
my @lines = do {
open my $fh, '<', $file or die "Can't open $file -- $!";
<$fh>;
};

my $file2 = "./$ARGV[2]/$ARGV[1]";
open( my $files, ">", "$file2" ) or die "Can't open > $file2: $!";

my $i = 0;
while ( $i < $ARGV[3] - 1 ) {

my @ref_fields = split( '\s+', $lines[$i] );

print $files
"$ref_fields[0]", "\t",
$ref_fields[1], "\t",
$ref_fields[2], "\t",
$ref_fields[3], "\n";

for my $j ( $i + 1 .. $ARGV[3] - 1 ) {

$i = $j;

# @curr_fields is initialized here

my @curr_fields = split /\s+/, $lines[$j];

if ( $ref_fields[0] eq $curr_fields[0] && $ref_fields[2] > $curr_fields[1] ) {

if ( defined( $curr_fields[0] ) && $curr_fields[0] !~ /\s+/ ) {

chomp $curr_fields[3];

# the line below is the one that is printing to standard output
print
$curr_fields[0], "\t",
$curr_fields[1], "\t",
$curr_fields[2], "\t",
$curr_fields[3], "\n";
}
}
else {
last;
}
}

print "\n";
}

Answer

It seems obvious that the blank lines are printing because you have a line

print "\n";

in your code

I can't help much more because you say "The details of what the script is doing isn't important", and so withhold from us what it's meant to be doing

However, what you have written prints lines from the input file as long as the first column matches the first column in the previous line and the second field is less than the third field in the previous line. Any time you get a line that doesn't qualify in this way you are printing a blank line



You may prefer this refactoring of your code, which behaves identically but I think is much more readable. It also has the advantage of splitting each of the lines from the input file only once, and there is no need for the fourth parameter as the number of lines is simply the size of the @lines array. Blank lines are removed from the file as they are read, so there's no longer a need for your check on the definedness of the first field

#!/usr/bin/perl

# ARGV[0] is the name of the file which data will be read from (may have overlaps)
# ARGV[1] is the name of the file which will be produced that will have no overlaps
# ARGV[2] is the name of the folder which will hold all the circos data file (mitelmanAll, mitelmanProstate, etc.)

use strict;
use warnings 'all';

use File::Path 'make_path';
use File::Spec::Functions 'catfile';

my ($file, $newfile, $dir) = @ARGV;
$newfile = catfile($dir, $newfile);

my @lines = do {
    open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!};
    map { [ split ] } grep /\S/, <$fh>;
};

make_path($dir);
open my $out_fh, '>', $newfile or die qq{Unable to open "$newfile" for output: $!};

for ( my $i = 0; $i < $#lines; ) {

    my $ref_fields = $lines[$i];

    print $out_fh join("\t", @$ref_fields[0..3]), "\n";

    for my $j ( $i + 1 .. $#lines ) {

        $i = $j;

        my $curr_fields = $lines[$j];

        last unless $curr_fields->[0] == $ref_fields->[0];
        last unless $curr_fields->[1] <  $ref_fields->[2];

        print join("\t", @$curr_fields[0..3]), "\n";
    }
}