An Illusion An Illusion - 3 months ago 13
Scala Question

Scala: Traversing an XML tree using DFS produces unexpected results

I am traversing an XML tree visiting each node using DFS. The output that I get is not what I expected.

object Main extends App {

lazy val testXml =
<vehicles>
<vehicle>
gg
</vehicle>
<variable>
</variable>
</vehicles>

traverse.dfs(testXml.head)
}

object traverse {
def dfs(node: Node): Unit = {
println("==============")
println(node.label + ">>>" + node.child + ">>>" + node.child.size)
node.child.map(child => {
dfs(child)
})
}
}


Output:

==============
vehicles>>>ArrayBuffer(
, <vehicle>
gg
</vehicle>,
, <variable>
</variable>,
)>>>5
==============
#PCDATA>>>List()>>>0
==============
vehicle>>>ArrayBuffer(
gg
)>>>1
==============
#PCDATA>>>List()>>>0
==============
#PCDATA>>>List()>>>0
==============
variable>>>ArrayBuffer(
)>>>1
==============
#PCDATA>>>List()>>>0
==============
#PCDATA>>>List()>>>0

Process finished with exit code 0


If you take a look at the output, for the first element (
vehicles
) it says it has 5 children. If you print the children, two children (the first and the last) are empty.


I want the traversal to visit
vehicles
then
vehicle
then
gg
and finally
variable
.



Any advice with this is appreciated. Thanks.

Answer

Those 2 children are not empty. They are text nodes containing line breaks and spaces between other elements.

If you define the XML as <vehicles><vehicle>gg</vehicle><variable></variable></vehicles> without line breaks and spaces your traversal will give the desired result.

But if you want the traversal to work on your original XML, you may filter the children to contain only the text nodes with actual content:

import scala.xml._

def filterEmptyNodes(nodes: Seq[Node]): Seq[Node] =
  nodes.collect(Function.unlift {
    case Text(text) =>
      if (text.trim.isEmpty) None
      else Some(Text(text.trim))
    case node => Some(node)
  })

And have the traversal function use this function:

object traverse {
  def dfs(node: Node): Unit = {
    val nonEmptyChildren = filterEmptyNodes(node.child)
    println("==============")
    println(node.label + ">>>" + nonEmptyChildren + ">>>" + nonEmptyChildren.size)
    nonEmptyChildren.foreach(dfs)
  }
}

On a side note, you may also use node \ "_" to get all child elements, but it won't contain text nodes.

Or you may use node.descendant or node.descendant_or_self to have a List of all the descendants in DFS order without writing the traversal yourself. You have to filter out the "empty" nodes from the descendants as well: filterEmptyNodes(node.descendant) or filterEmptyNodes(node.descendant_or_self)