Tomukoe Tomukoe - 1 month ago 16
Perl Question

Output InnerHTML Perl LibXML

Please see MWA below

use XML::LibXML;
my $content = "<tr>
<td class='title'>Synonym(s)</td>
<td>Automobile<br/>Car<br/></td>
</tr>";

my $parser = XML::LibXML->new({suppress_errors=>1, suppress_warnings=>1, recover=>2});
my $document = $parser->parse_html_string($content);
my @node = $document->findnodes('//td[@class="title" and text()="Synonym(s)"]/following-sibling::td');
print $node[0]->toString();


The output is:
<td>Automobile<br/>Car<br/></td>


But I need just the "inner" content:
Automobile<br/>Car<br/>


How do I need to change the xPath, or do I need a different LibXML method.

Thank you,
Tobias

Answer

There is no built-in way to do that in XML::LibXML. Changing the xpath won't help. That xpath gives you a list of td elements. If you'd get the stuff inside of all td elements that fit the current xpath, you'd end up with a long list of things and no way to discern where the content of the first td ends and the second td starts.

Instead, you need to construct it by converting all children of the td element to string.

print join '', map { $_->toString } $node[0]->childNodes;

Output:

Automobile<br/>Car<br/>
Comments