Warpin Warpin - 24 days ago 8
Perl Question

perl : read XML from and to file while preserving line breaks

I use this perl code to read XML from a file, and then write to another file (my full script has code to add attributes):

#!usr/bin/perl -w

use strict;
use XML::DOM;
use XML::Simple;

my $num_args = $#ARGV + 1;

if ($num_args != 2) {
print "\nUsage: ModifyXML.pl inputXML outputXML\n";
exit;
}

my $inputPath = $ARGV[0];
my $outputPath = $ARGV[1];

open(inputXML, "$inputPath") || die "Cannot open $inputPath \n";

my $parser = XML::DOM::Parser->new();
my $data = $parser->parsefile($inputPath) || die "Error parsing XML File";

open my $fh, '>:utf8', "$outputPath" or die "Can't open $outputPath for writing: $!\n";
$data->printToFileHandle($fh);

close(inputXML);


however this doesn't preserve characters like line breaks. For example, this XML:

<?xml version="1.0" encoding="utf-8"?>
<Test>
<Notification Content="test1 testx &#xD;&#xA;test2&#xD;&#xA;test3&#xD;&#xA;" Type="Test1234">
</Notification>
</Test>


becomes this:

<?xml version="1.0" encoding="utf-8"?>
<Test>
<Notification Content="test1 testx

test2

test3

" Type="Test1234">
</Notification>
</Test>


I suspect I'm not writing to file properly.

Answer

Use XML::LibXML, for example. The main modules that get involved are XML::LibXML::Parser and XML::LibXML::DOM (along with others). The returned object is generally XML::LibXML::Document

use warnings 'all';
use strict;

use XML::LibXML;

my $inputPath  = 'with_encodings.xml';
my $outputPath = 'keep_encodings.xml';

my $reader = XML::LibXML->new();
my $doc = $reader->load_xml(location => $inputPath, no_blanks => 1); 

print $doc->toString();

my $state = $doc->toFile($outputPath);

It is not necessary to first create an object. I do it since in this way one can conveniently use various methods to set up encodings (for example), before parsing but outside of the constructor.

This module is also far more convenient for processing.

The XML::Twig should also behave as expected, and is also far better for processing.

Comments