ssr1012 ssr1012 - 1 month ago 9
Perl Question

Change attribute value if condition matches in second or third level section elements

Now I have a XML File and to be modified the Section level attribute (Second or Third level particularly). For eg:

Input:

<?xml version="1.0"?>
<article>
<front></front>
<body>
<sec id="sec1">
<title>1. Introduction</title><p>The cerebrospinal venous system has been the focus of many studies in the last few years, because of the hypothesized involvement of insufficient extracranial venous drainage in central nervous system disorders such as multiple sclerosis, normal-pressure hydrocephalus, and transient monocular blindness [<xref ref-type="bibr" rid="B1">1</xref>&ndash;<xref ref-type="bibr" rid="B4">4</xref>]. An insufficiency in venous blood drainage can be due to the presence of single or multiple stenosis on the main routes of cerebrospinal venous system [<xref ref-type="bibr" rid="B5">5</xref>].</p>
<sec id="sec1.1">
<title>Section level 2</title>
<p><def-list><def-item><term>term I:</term><def><p>defintion I</p></def></def-item><def-item><term>term 2:</term><def><p>defintion 2</p></def></def-item></def-list>In the past years, great efforts have been made to develop excellent algorithms and tools for the processing and analyzing of traditional BS-Seq data [<xref ref-type="bibr" rid="B7">7</xref>&#x2013;<xref ref-type="bibr" rid="B10">10</xref>] but none for hairpin-BS-Seq data. In this study, we designed and implemented HBS-tools and compared them against other state-of-the-art mapping tools. Our result indicated that HBS-tools have a reduced mapping time and improved mapping efficiency.</p>
</sec>
</sec></body>
</article>


If Second or Third level section elements precedes def-list then I have insert the attribute for the particular section level
att1="deflist"
.

Expected Output:

<?xml version="1.0"?>
<article>
<front></front>
<body>
<sec id="sec1">
<title>1. Introduction</title><p>The cerebrospinal venous system has been the focus of many studies in the last few years, because of the hypothesized involvement of insufficient extracranial venous drainage in central nervous system disorders such as multiple sclerosis, normal-pressure hydrocephalus, and transient monocular blindness [<xref ref-type="bibr" rid="B1">1</xref>&ndash;<xref ref-type="bibr" rid="B4">4</xref>]. An insufficiency in venous blood drainage can be due to the presence of single or multiple stenosis on the main routes of cerebrospinal venous system [<xref ref-type="bibr" rid="B5">5</xref>].</p>
<sec id="sec1.1" att1="deflist">
<title>Section level 2</title>
<p><def-list><def-item><term>term I:</term><def><p>defintion I</p></def></def-item><def-item><term>term 2:</term><def><p>defintion 2</p></def></def-item></def-list>In the past years, great efforts have been made to develop excellent algorithms and tools for the processing and analyzing of traditional BS-Seq data [<xref ref-type="bibr" rid="B7">7</xref>&#x2013;<xref ref-type="bibr" rid="B10">10</xref>] but none for hairpin-BS-Seq data. In this study, we designed and implemented HBS-tools and compared them against other state-of-the-art mapping tools. Our result indicated that HBS-tools have a reduced mapping time and improved mapping efficiency.</p>
</sec>
</sec></body>
</article>


MyCode:

use strict;
use warnings;
use XML::Twig;

my $t= XML::Twig->new( twig_handlers =>
{ 'sec/section/def-list' => \&Check_deflist }
)
->parsefile('input.xml');

sub Check_deflist
{ }


Apologies for the dirty code... Anyone could help me on this one and it would be appreciated.

Answer

First you need to fix your xpath expression. Your elements are called <sec>, not <section>. Then you need to use the right expression to target the <def-list> elements. They don't directly precede the second <sec>, so you need to use two slashes //.

sec/sec//def-list

Now for the handler, you can take the element and go up its tree to find <sec>s. We put that into a list and take the first one, which is another element. On that, we set the attribute. That's it.

use strict;
use warnings;
use XML::Twig;

my $t = XML::Twig->new(
    twig_handlers => { 'sec/sec//def-list' => \&Check_deflist },
    pretty_print  => 'indented'
)->parse( \*DATA )->print;

sub Check_deflist {
    ( $_->ancestors('sec') )[0]->set_att( att1 => 'deflist' );
}

__DATA__
<sec id="sec1">
<title>Section level 1</title>
<p>.......</p>
    <sec id="sec1.1">
    <title>Section level 2</title>
    <p><def-list><p>...</p>...</def-list></p>
    </sec>
</sec>

Output with pretty-print:

<sec id="sec1">
  <title>Section level 1</title>
  <p>.......</p>
  <sec att1="deflist" id="sec1.1">
    <title>Section level 2</title>
    <p>
      <def-list><p>...</p>...</def-list>
    </p>
  </sec>
</sec>

It will also work if there is more than one <def-list> in a second level <sec>.