Lucas Lucas - 1 year ago 90
Perl Question

Stripping CData Tags XML Perl

In PHP you could simply strip off CDATA tags in XML by doing the following:

simplexml_load_string($string, 'SimpleXMLElement', LIBXML_NOCDATA);

I was wondering how could I do this in Perl using
or any other module?

My client tends to send an xml like this:

<msg t='sys'><body action='login' r='0'><login z='w1'><nick><![CDATA[Test]]></nick><pword><![CDATA[4c24a5558542bf35cca54d8749c78de6]]></pword></login></body></msg>

Using XML::Bare I would parse it like this:

$string = "<msg t='sys'><body action='login' r='0'><login z='w1'><nick><![CDATA[Test]]></nick><pword><![CDATA[4c24a5558542bf35cca54d8749c78de6]]></pword></login></body></msg>";
$strXML = XML::Bare->new('text' => $string)->parse;
say $strXML->{msg}->{body}->{login}->{nick}->{value};

and it works but I'd like to strip off the cdata tags to prevent an sql injection in my server. Does anyone know how I can go about doing this? I've searched all over the web for a solution and haven't been able to find one.

Answer Source

For example the following:

use 5.014;
use warnings;
use XML::LibXML;

#the input xml
my $str = q{<msg t='sys'><body action='login' r='0'><login z='w1'><nick><![CDATA[Test]]></nick><pword><![CDATA[4c24a5558542bf35cca54d8749c78de6]]></pword></login></body></msg>};

#the parsing
my $dom = XML::LibXML->load_xml(
    string => $str,
    no_cdata => 1,  #strip CDATA

#nice-print the parsed xml
say $dom->toString(2);

#print the "nick" and pword
say "the nick  is ==", $dom->find( '//nick' )->string_value, "==";
say "the pword is ==", $dom->find( '//pword' )->string_value, "==";

prints the original XML without the CDATA, such as:

<?xml version="1.0"?>
<msg t="sys">
  <body action="login" r="0">
    <login z="w1">

the nick  is ==Test==
the pword is ==4c24a5558542bf35cca54d8749c78de6==