Ava Xue Ava Xue - 6 months ago 23
Perl Question

Recursive subdirectory grep

Sorry for being a pain.

I'm trying to grep the string

Distance:
from each
pairsAngles.txt
file within over 2,000 subdirectories; the names of the subdirectories are obtained from a CSV file.

Each subdirectory contains one
pairsAngles.txt
, within which there is only one line that contains
Distance:
. However, my current
foreach
and
while
loops give me eight
Distance
values for each subdirectory.

In addition, each subsequent subdirectory gets all the distances from the previous subdirectories.

Like this:

enter image description here

A text version of the picture (row #4, column #2 has 4*8 = 32 entries of Distance)

oligomerAngle-2j8c-003_004-0171_0196_L-0226_0250_L-B011A001

Distance: 7.98675
Distance: 7.98675
Distance: 7.98675
Distance: 7.98675
Distance: 7.98675
Distance: 7.98675
Distance: 7.98675
Distance: 7.98675
Distance: 7.95099
Distance: 7.95099
Distance: 7.95099
Distance: 7.95099
Distance: 7.95099
Distance: 7.95099
Distance: 7.95099
Distance: 7.95099
Distance: 7.87554
Distance: 7.87554
Distance: 7.87554
Distance: 7.87554
Distance: 7.87554
Distance: 7.87554
Distance: 7.87554
Distance: 7.87554
Distance: 7.69417
Distance: 7.69417
Distance: 7.69417
Distance: 7.69417
Distance: 7.69417
Distance: 7.69417
Distance: 7.69417
Distance: 7.69417


But the actual value should just be "Distance: 7.69417"
Not sure what went wrong. Here's the code:

use File::Find;
use Text::CSV_XS;

my @pairs = ();
my @result = ();
my $in;
my $out;
my $c1;
my $dist = "";
my $dir = "/home/oligomerAngle";

my $cluster = "clst1.csv";
open( $in, $cluster ) || die "cannot open \"$cluster\": $!";

my $cU = "clst1Updated.csv";
open( $out, ">$cU" ) || die "cannot open '$cU' $!";

my $csv = Text::CSV_XS->new( { binary => 1, auto_diag => 1, eol => $/ } );

while ( $c1 = <$in> ) {
chomp $c1;
@pairs = split( ' ', $c1 );

foreach my $pair (@pairs) {

find( \&Matches, "$dir/$c1" );

sub Matches {
open( my $subdir, "pairsAngles.txt" ) or die "$!";

while ( $dist = <$subdir> ) {

if ( $dist =~ m/Distance:/ ) {
push( @result, "$dist" );
}
}
}

chdir "..";
$csv->say( $out, [ "@pairs", "@result" ] );
}
}

if ( not $csv->eof ) {
$csv->error_diag();
}

close $out or die "$!";

Answer

The posted code seems to greatly over-comlicate the matters, given the clarifications.

The code below takes a subdirectory name from the $cluster file iterated over by <$in>, then it builds the file name using $dir and it. Lines in the file are then iterated over to find the one with the pattern. Once that happens we print results and move on to the next file (in the next subdirectory).

Note that we don't really need @result unless more processing happens later.

# Iterate over subdirectories that each have the file
while ( $c1 = <$in> ) {
    chomp $c1;

    # Build the full file name in this subdirectory, open the file
    my $filename = "$dir/$c1/pairsAngles.txt";
    open my $fh_in, $filename  or die "$!";

    # Iterate over lines in the file to find the pattern
    while ( my $line = <$fh_in> ) { 
        if ( $line =~ m/Distance:/ ) { 
            # Found our result, print output
            chomp($line);
            $csv->say($out, [$c1, $line]);
            push @result, $line;
            # No need to continue if we know there is exactly one
            last; 
        }   
    }   
}
# Do something else with @result if needed ...