smandape smandape - 5 months ago 22
Perl Question

XML parsing issue in Perl

I am trying to parse the abstract part from the XML file. I am using forcearray. I wrote the code but its just working when the abstract is in array and not working when array is not present. This is because when in an array I also use {content} and when not in array the {content} is missing. The code is as follows

use LWP::Simple;
use XML::Simple;
use Data::Dumper;

open (FH, ">:utf8","xmlparsed2.txt");

my $db1 = "pubmed";
my $query = "9915366";
my $q = 16404398;
my $xml = new XML::Simple;

$urlxml = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=$db1&id=$q&retmode=xml&rettype=abstract";
$dataxml = get($urlxml);
$data = $xml->XMLin("$dataxml", ForceArray => [qw( MeshHeading Author AbstractText )], ForceContent => 1);
print FH Dumper($data);

print FH "Abstract: ".join "\n", map {join ":",($_->{NlmCategory},$_->{content})} @{$data->{PubmedArticle}->{MedlineCitation}->{Article}->{Abstract}->{AbstractText}};
print FH "\n";
print FH "Title: "."$data->{PubmedArticle}->{MedlineCitation}->{Article}->{ArticleTitle}\n";
print FH "\n";
print FH "MeSH: ".join '$$', map $_->{DescriptorName}{content}, @{$data->{PubmedArticle}->{MedlineCitation}->{MeshHeadingList}->{MeshHeading}};
print FH "\n";
print FH "Authors: ".join '$$', map {join " ",($_->{LastName},$_->{ForeName})} @{$data->{PubmedArticle}{MedlineCitation}{Article}{AuthorList}{Author}};


Well, when in array(replcae $q in $urlxml by $query) I want the abstract with its NlmCategory like Objective: To determine if the long....... For the above code it is giving me the desired output but with hash at the end as follows:

METHODS:Tertiary care outpatient and inpatient rehabilitation center directly attached to a university hospital.:HASH(0x69d0810).


And for the abstract where it is not an array($q in $urlxml) this code doesn't seem to work, probably because there is not content term(I found this in data dumper). I played a bit and it sort of worked if I do something like just $_ for the array but also prints the two ::. In short I want my code to work for both $query and $q. Can you help?

Answer Source

Use ForceContent => 1.

Or:

use strict;
use warnings;
use feature qw( say );

use LWP::Simple qw( get );
use XML::LibXML qw( );
use URI         qw( );

binmode STDOUT, ':encoding(UTF-8)';

my $db = "pubmed";
my $id = $ARGV[0] || '9915366';

my $url = URI->new('http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi');
$url->query_form(
   db      => $db,
   id      => $id,
   retmode => 'xml',
   rettype => 'abstract',
);

my $xml = get($url);

my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($xml);
my $root = $doc->documentElement();

for my $node ($root->findnodes('PubmedArticle/MedlineCitation/Article/Abstract/AbstractText')) {
   say join ':', $node->getAttribute('NlmCategory') // '', $node->textContent();
}