Luigi Cortese Luigi Cortese - 10 months ago 41
Java Question

Anonymizing xml via regex: how to remove data while leaving the tags in Java?

Given an xml structure in a String type, I'm looking for a way to replace data with four asterisks, while leaving the tags in their place. That is, starting from this

<one> <two> abc </two> <two> def </two> </one>

I want it to become

<one> <two> **** </two> <two> **** </two> </one>

I've tried

requestBody.replaceAll(">[^<]+?<","> **** <")

but I'm also capturing any blank spaces between two adjacent tags, having therefore

<one> **** <two> **** </two> **** <two> **** </two> **** </one>

How could I achieve my goal? Any suggestions?

Here for some tests.


Following Michael Kay suggestions I've found this solution

* Anonimyzes an xml structure replacing all data between tags with 4 asterisks.
* Tags won't be replaced.
* @param xmlInput the string representing the xml to be anonymized
* @return the anonymized xml structure.
private String anonymizeXml(String xmlInput){
String anonimizedXml=null;
try {
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new StringReader("<xsl:transform version=\"1.0\" xmlns:xsl=\"\"><xsl:template match=\"*\"> <xsl:copy> <xsl:apply-templates/> </xsl:copy></xsl:template><xsl:template match=\"text()[normalize-space()]\"> **** </xsl:template></xsl:transform>"));
Transformer transformer;
transformer = factory.newTransformer(xslt);
Source text = new StreamSource(new StringReader(xmlInput));

StringWriter writer = new StringWriter();
transformer.transform(text, new StreamResult(writer));
anonimizedXml = writer.toString();

} catch (TransformerConfigurationException e) {
} catch (TransformerException e) {
return anonimizedXml;

Answer Source

This is a job for a very simple XSLT transformation:

<xsl:transform version="1.0" xmlns:xsl="">

<xsl:template match="*">

<xsl:template match="text()[normalize-space()]">****</xsl:template>