PerlPingu PerlPingu - 5 months ago 20
Perl Question

Using Try and Catch to move past errors

This is my first question so I apologise in advance if I format/ask it all wrong.

I am using Perl to extract a string from a file, submit a web form, and download a new file created by the web-page. The aim is to have it run for 30,000 files in a loop, which I estimate will take ~8 days. I am using WWW::Selenium and WWW::Mechanize to perform the web automation. The issue I have is that if for some reason a page doesn't load properly or the internet drops for a period of time then the script exits and gives an error message like(depending on which stage it failed at):

Error requesting http://localhost:4444/selenium-server/driver/:
ERROR: Could not find element attribute: link=Download PDB File@href


I would like the script to continue running, moving onto the next round of the loop so I don't have to worry if a single round of the loop throws an error. My research suggests that using
Try::Tiny
may be the best solution. Currently I have the script below using only
try{...}
which seems to suppress any error and allow the script to continue through the files. However I'm concerned that this seems to be a very blunt solution and provides me no insight into which/why files failed.

Ideally I would want to print the filename and error message for each occurence to another file that could then be reviewed once the script is complete but I am struggling to understand how to use
catch{...}
to do this or if that is even the correct solution.

use strict;
use warnings;
use WWW::Selenium;
use WWW::Mechanize;
use Try::Tiny;


my @fastas = <*.fasta>;
foreach my $file (@fastas) {
try{

open(my $fh, "<", $file);
my $sequence;
my $id = substr($file, 0, -6);
while (my $line = <$fh>) {

## discard fasta header line
} elsif($line =~ /^>/) { # / (turn off wrong coloring)
next;

## keep line, add to sequence string
} else {
$sequence .= $line;
}
}
close ($fh);

my $sel = WWW::Selenium->new( host => "localhost",
port => 4444,
browser => "*firefox",
browser_url => "http://www.myurl.com",
);

$sel->start;
$sel->open("http://www.myurl.com");
$sel->type("chain1", $sequence);
$sel->type("chain2", "EVQLVESGPGLVQPGKSLRLSCVASGFTFSGYGMHWVRQAPGKGLEWIALIIYDESNKYYADSVKGRFTISRDNSKNTLYLQMSSLRAEDTAVFYCAKVKFYDPTAPNDYWGQGTLVTVSS");
$sel->click("css=input.btn.btn-success");
$sel->wait_for_page_to_load("30000");

## Wait through the holding page - will timeout after 5 mins
$sel->wait_for_element_present("link=Download PDB File", "300000");
## Get the filename part of link
$sel->wait_for_page_to_load("30000");
my $pdbName = $sel->get_attribute("link=Download PDB File\@href");
## Concatenate it with the main domain
my $link = "http://www.myurl.com/" . $pdbName;
$sel->stop;

my $mech = WWW::Mechanize->new( autocheck => 1 );
$mech -> get($link);
#print $mech -> content();
$mech -> save_content($id . ".pdb");
};

}

Answer

You are completely right that you want to see, log, and review the errors. The mechanism and syntax provided by Try::Tiny is meant to be bare-bones and simple to use.

use feature qw(say);

my $errlog = 'error_log.txt';
open my $fh_err, '>', $errlog  or die "Can't open $errlog for writing: $!";

foreach my $file (@fastas) {
    try {
        # processing, potentially throwing a die
    }
    catch {
        say $fh_err "Error with $file: $_";   # NOTE, it is $_ (not $! or $@)
    };
}
close $fh_err;

# Remove the log if empty
if (-z $errlog) { 
    say "No errors logged, removing $errlog";
    unlink $errlog or warn "Can't unlink $errlog: $!";
}    

You can also save names of files that failed processing, with push @failed_files, $file inside the catch { } block. Then the code could attempt again after the main processing, if you know that errors are mostly due to random connection problems. And having the list of failed files is handy.

Note that with v5.14 the problems that this module addresses were fixed, so that a normal use of eval is fine. It is mostly a matter of preference at this point, but note that Try::Tiny has a few twists of its own. See this post for a discussion.

This addresses the question of simple exception handling, not the rest of the code.