cpiock cpiock - 1 month ago 11
Groovy Question

Regex find all \n in xml tags

I must search all \n inside of all xml tags in my xmal structure. So there are many different xml tags and in this tags can be a string that contains a \n.
How can i find all the \n matches?

EDIT 1



Example: http://regex101.com/r/8hWhAX/2.
I need the regex in a groovy script

EDIT 2



I need only the \n and not the whole string that contains the \n

Edit 3



I only want to look in the tags not in the whole string

Answer

Considerations:

Using any imperative language able to manage an XML string and Regular expressions, how to find '\n'.

The function will receive the full XML content and shall return a vector of indexes to the found characters.

Solution:

XML require a parser of type at least LL (possibly LL(1), to be checked.). Regular expressions are based on finite state-machines, which do not allows to parse an LL grammaire.

You are required to parse the XML somehow (e.g. with a DOM library) and use any of the top RegExp on provided tags.

References: https://en.wikipedia.org/wiki/Chomsky_hierarchy