BigMikeW BigMikeW - 11 days ago 8
Java Question

How do I drop an inbound XML element with a Transform in CXF?

I'm using CXF (v2.7.10) in a client that is consuming the MS Exchange Web Service (EWS).

I'm finding that one of the elements returned by EWS (UniqueHash) contains characters that are invalid in XML v1.0. As I have no control over this I'm trying to use an inbound interceptor to drop the UniqueHash elements (I don't need them) like this:

Map<String, String> inTransformMap = Collections.singletonMap(
"{http://schemas.microsoft.com/exchange/services/2006/types}UniqueHash", "");
TransformInInterceptor transformInInterceptor = new TransformInInterceptor();
transformInInterceptor.setInTransformElements(inTransformMap);
client.getInInterceptors().add(transformInInterceptor);


I can see that the transform (TransformInInterceptor) is running nice and early (post-stream):

FINE: Chain org.apache.cxf.phase.PhaseInterceptorChain@be78549 was created. Current flow:
receive [PolicyInInterceptor, LoggingInInterceptor, AttachmentInInterceptor]
post-stream [TransformInInterceptor, StaxInInterceptor]
read [WSDLGetInterceptor, ReadHeadersInterceptor, SoapActionInInterceptor, StartBodyInterceptor]
pre-protocol [MustUnderstandInterceptor]
post-protocol [CheckFaultInterceptor, JAXBAttachmentSchemaValidationHack]
unmarshal [DocLiteralInInterceptor, SoapHeaderInterceptor]
post-logical [WrapperClassInInterceptor]
pre-invoke [SwAInInterceptor, HolderInInterceptor]


But even though it appears to be working as intended stepping through the code, when DocLiteralInInterceptor fires later on it throws this unmarshalling error (0x4 in this case is within the UniqueHash element I thought I'd dropped):

org.apache.cxf.interceptor.Fault: Unmarshalling Error: Illegal character entity: expansion character (code 0x4
at [row,col {unknown-source}]: [1,2230]
at org.apache.cxf.jaxb.JAXBEncoderDecoder.unmarshall(JAXBEncoderDecoder.java:881)
at org.apache.cxf.jaxb.JAXBEncoderDecoder.unmarshall(JAXBEncoderDecoder.java:702)
at org.apache.cxf.jaxb.io.DataReaderImpl.read(DataReaderImpl.java:160)
at org.apache.cxf.interceptor.DocLiteralInInterceptor.handleMessage(DocLiteralInInterceptor.java:192)
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:272)
at org.apache.cxf.endpoint.ClientImpl.onMessage(ClientImpl.java:835)
at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.handleResponseInternal(HTTPConduit.java:1614)
at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.handleResponse(HTTPConduit.java:1504)
at org.apache.cxf.transport.http.HTTPConduit$WrappedOutputStream.close(HTTPConduit.java:1310)
at org.apache.cxf.transport.http.asyncclient.AsyncHTTPConduit$AsyncWrappedOutputStream.close(AsyncHTTPConduit.java:381)
at org.apache.cxf.io.CacheAndWriteOutputStream.postClose(CacheAndWriteOutputStream.java:50)
at org.apache.cxf.io.CachedOutputStream.close(CachedOutputStream.java:223)
at org.apache.cxf.transport.AbstractConduit.close(AbstractConduit.java:56)
at org.apache.cxf.transport.http.HTTPConduit.close(HTTPConduit.java:628)
at org.apache.cxf.interceptor.MessageSenderInterceptor$MessageSenderEndingInterceptor.handleMessage(MessageSenderInterceptor.java:62)
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:272)
at org.apache.cxf.endpoint.ClientImpl.doInvoke(ClientImpl.java:565)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:474)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:377)
at org.apache.cxf.endpoint.ClientImpl.invoke(ClientImpl.java:330)
at org.apache.cxf.frontend.ClientProxy.invokeSync(ClientProxy.java:96)
at org.apache.cxf.jaxws.JaxWsClientProxy.invoke(JaxWsClientProxy.java:135)
at com.sun.proxy.$Proxy67.searchMailboxes(Unknown Source)
Caused by: javax.xml.bind.UnmarshalException


Here's the XML response I'm working with:

<?xml version="1.0" encoding="utf-8"?>
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Header>
<h:ServerVersionInfo MajorVersion="15" MinorVersion="0" MajorBuildNumber="847" MinorBuildNumber="31" Version="V2_8" xmlns:h="http://schemas.microsoft.com/exchange/services/2006/types" xmlns="http://schemas.microsoft.com/exchange/services/2006/types" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>
</s:Header>
<s:Body xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<m:SearchMailboxesResponse xmlns:m="http://schemas.microsoft.com/exchange/services/2006/messages" xmlns:t="http://schemas.microsoft.com/exchange/services/2006/types">
<m:ResponseMessages>
<m:SearchMailboxesResponseMessage ResponseClass="Success">
<m:ResponseCode>NoError</m:ResponseCode>
<m:SearchMailboxesResult>
<t:SearchQueries>
<t:MailboxQuery>
<t:Query>"general quarters"</t:Query>
<t:MailboxSearchScopes>
<t:MailboxSearchScope>
<t:Mailbox>/o=First Organization/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=6f8abfc1a1694cf299c7b3ae5522d8c4-John</t:Mailbox>
<t:SearchScope>All</t:SearchScope>
</t:MailboxSearchScope>
</t:MailboxSearchScopes>
</t:MailboxQuery>
</t:SearchQueries>
<t:ResultType>PreviewOnly</t:ResultType>
<t:ItemCount>1</t:ItemCount>
<t:Size>3169</t:Size>
<t:PageItemCount>1</t:PageItemCount>
<t:PageItemSize>3169</t:PageItemSize>
<t:Items>
<t:SearchPreviewItem>
<t:Id Id="AAMkADY4MDY1MWViLTMzMWItNDEyYi1iMjUzLTQ2ZjMwNWVkYmIzYQBGAAAAAABkY13xq9IqS5OySCQXk7W3BwC9AjA7QbibQa9DQZUO2Dm3AAAAAAAMAAC9AjA7QbibQa9DQZUO2Dm3AAAE/bU4AAA=" ChangeKey="CQAAABYAAAC9AjA7QbibQa9DQZUO2Dm3AAAE/ceM"/>
<t:Mailbox>
<t:MailboxId>/o=First Organization/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=6f8abfc1a1694cf299c7b3ae5522d8c4-John</t:MailboxId>
<t:PrimarySmtpAddress>john.smith@internal.local</t:PrimarySmtpAddress>
</t:Mailbox>
<t:ParentId Id="AQMkADY4MDY1MWViLTMzADFiLTQxMmItYjI1My00NmYzMDVlZGJiADNhAC4AAANkY13xq9IqS5OySCQXk7W3AQC9AjA7QbibQa9DQZUO2Dm3AAADDAAAAA==" ChangeKey="AQAAAA=="/>
<t:ItemClass>IPM.Note</t:ItemClass>
<t:UniqueHash>00036&lt;0788d814ffea4e499c2fdb479c8617a2@ex01.internal.local&gt;0010General Quarters000C&#x4;&#x0;&#x0;&#x0;?&#x19;"&#x10;&#x12;{&#xB;&#x17;</t:UniqueHash>
<t:SortValue>001B2014-03-11T19:42:42.00000000000006F00001182</t:SortValue>
<t:OwaLink>https://ex01.internal.local/owa/integrated/?viewmodel=ItemReadingPaneViewModelPopOutFactory&amp;IsDiscoveryView=1&amp;exsvurl=1&amp;ItemID=AAMkADY4MDY1MWViLTMzMWItNDEyYi1iMjUzLTQ2ZjMwNWVkYmIzYQBGAAAAAABkY13xq9IqS5OySCQXk7W3BwC9AjA7QbibQa9DQZUO2Dm3AAAAAAAMAAC9AjA7QbibQa9DQZUO2Dm3AAAE%2FbU4AAA%3D</t:OwaLink>
<t:Sender>John Smith</t:Sender>
<t:ToRecipients>
<t:SmtpAddress>Our Shared Mailbox</t:SmtpAddress>
</t:ToRecipients>
<t:CreatedTime>2014-03-11T19:42:42Z</t:CreatedTime>
<t:ReceivedTime>2014-03-11T19:42:42Z</t:ReceivedTime>
<t:SentTime>2014-03-11T19:42:42Z</t:SentTime>
<t:Subject>General Quarters</t:Subject>
<t:Size>3169</t:Size>
<t:Preview/>
<t:Importance>Normal</t:Importance>
<t:Read>true</t:Read>
<t:HasAttachment>false</t:HasAttachment>
</t:SearchPreviewItem>
</t:Items>
<t:MailboxStats>
<t:MailboxStat>
<t:MailboxId>/o=First Organization/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=6f8abfc1a1694cf299c7b3ae5522d8c4-John</t:MailboxId>
<t:DisplayName>John Smith</t:DisplayName>
<t:ItemCount>1</t:ItemCount>
<t:Size>3169</t:Size>
</t:MailboxStat>
</t:MailboxStats>
</m:SearchMailboxesResult>
</m:SearchMailboxesResponseMessage>
</m:ResponseMessages>
</m:SearchMailboxesResponse>
</s:Body>
</s:Envelope>


Does anyone know what I'm doing wrong here? Any pointers on how I get rid of this element and it's troublesome content?

Answer

Figured it out (and thanks to Daniel Kulp for confirming it via the cxf users mailing list).

The issue was that the InTransformReader extends DepthXMLStreamReader. This means that even though I was trying to drop or replace invalid characters, the TransformInInterceptor would first attempt to unmarshall them anyway.

The solution was to create a new Interceptor that extended AbstractPhaseInterceptor and filter out the invalid text using a regex during the PRE_STREAM phase, before the StaxInInterceptor was invoked.

Easy once you know how!

Example:

The following will remove invalid XML chars from a soap message:

import org.apache.cxf.interceptor.Fault;
import org.apache.cxf.message.Message;
import org.apache.cxf.phase.AbstractPhaseInterceptor;
import org.apache.cxf.phase.Phase;

import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.commons.io.IOUtils;
import org.apache.cxf.io.CachedOutputStream;

public class InvalidCharInterceptor extends AbstractPhaseInterceptor<Message> {

  public InvalidCharInterceptor() {
    super(Phase.PRE_STREAM);
  }

  /**
   * From xml spec valid chars:<br>
   * #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]<br>
   * any Unicode character, excluding the surrogate blocks, FFFE, and FFFF.<br>
   * 
   * @param text
   *          The String to clean
   * @param replacement
   *          The string to be substituted for each match
   * @return The resulting String
   */
  public static String cleanInvalidXmlChars(String text, String replacement) {
    String re = "[^\\x09\\x0A\\x0D\\x20-\\xD7FF\\xE000-\\xFFFD\\x10000-x10FFFF]";
    return text.replaceAll(re, replacement);
  }

  @Override
  public void handleMessage(Message message) throws Fault {
    boolean isOutbound = false;
    isOutbound = message == message.getExchange().getOutMessage()
        || message == message.getExchange().getOutFaultMessage();

    if (isOutbound) {
      OutputStream os = message.getContent(OutputStream.class);

      CachedOutputStream cs = new CachedOutputStream();
      message.setContent(OutputStream.class, cs);

      message.getInterceptorChain().doIntercept(message);

      try {
        cs.flush();
        IOUtils.closeQuietly(cs);
        CachedOutputStream csnew = (CachedOutputStream) message.getContent(OutputStream.class);

        String currentEnvelopeMessage = IOUtils.toString(csnew.getInputStream(), "UTF-8");
        csnew.flush();
        IOUtils.closeQuietly(csnew);

        String res = cleanInvalidXmlChars(currentEnvelopeMessage, "");
        res = res != null ? res : currentEnvelopeMessage;

        InputStream replaceInStream = IOUtils.toInputStream(res, "UTF-8");

        IOUtils.copy(replaceInStream, os);
        replaceInStream.close();
        IOUtils.closeQuietly(replaceInStream);

        os.flush();
        message.setContent(OutputStream.class, os);
        IOUtils.closeQuietly(os);

      } catch (IOException ioe) {
        throw new RuntimeException(ioe);
      }
    }
  }

}

Then you add it to your client:

client.getOutInterceptors().add(new InvalidCharInterceptor());