用于过滤掉无效 xml 字符的 JAX WS SOAPHandler

JAXWS SOAPHandler for filtering out the Invalid xml chracters

我有一个 Web 服务,其中客户端传递一个包含一些无效 xml 字符的字符串。在服务器端解析请求时,JAXWS 会抛出异常,因为它无法解析无效的 xml 字符。

为了解决这个问题,我尝试创建以下 SOAPHandler。在下面的 SOAPHandler 中,我尝试迭代子元素,这意味着我已经在解析 xml 并因此在处理程序本身中得到异常。

如何从邮件中删除无效的 xml 字符?

import java.util.Set;

import javax.xml.namespace.QName;
import javax.xml.soap.SOAPBodyElement;
import javax.xml.soap.SOAPException;
import javax.xml.soap.SOAPMessage;
import javax.xml.ws.handler.MessageContext;
import javax.xml.ws.handler.soap.SOAPHandler;
import javax.xml.ws.handler.soap.SOAPMessageContext;

import org.apache.commons.lang.StringUtils;
import org.apache.log4j.Logger;

public class InvalidXmlCharacterHandler implements SOAPHandler<SOAPMessageContext>{

     private static final Logger LOGGER = Logger.getLogger(InvalidXmlCharacterHandler.class);


    @Override
    public boolean handleMessage(SOAPMessageContext context) {

        System.out.println("Server : handleMessage()......");

        Boolean isRequest = (Boolean) context.get(MessageContext.MESSAGE_OUTBOUND_PROPERTY);

        //for request message only
        if(!isRequest){

            SOAPMessage soapMsg = context.getMessage();

            if (soapMsg != null) {
               try {

                   java.util.Iterator iterator = soapMsg.getSOAPBody().getChildElements();
                   while (iterator.hasNext()) {
                     SOAPBodyElement bodyElement = (SOAPBodyElement) iterator.next();
                     String val = bodyElement.getTextContent();
                     bodyElement.setTextContent(stripNonValidXMLCharacters(val));    
                     System.out.println("The Value is:" + val);
                   }


               } catch (SOAPException ex) {
                  LOGGER.error("Failed to get and set source", ex);
               }

            }

        }

        //continue other handler chain
        return true;
    }




     public static String stripNonValidXMLCharacters(String in) {
          StringBuffer out = new StringBuffer(); // Used to hold the output.
          char current; // Used to reference the current character.

          if (in == null || ("".equals(in))) return ""; // vacancy test.
          for (int i = 0; i < in.length(); i++) {
              current = in.charAt(i); // NOTE: No IndexOutOfBoundsException caught here; it should not happen.
              if ((current == 0x9) ||
                  (current == 0xA) ||
                  (current == 0xD) ||
                  ((current >= 0x20) && (current <= 0xD7FF)) ||
                  ((current >= 0xE000) && (current <= 0xFFFD)) ||
                  ((current >= 0x10000) && (current <= 0x10FFFF)))
                  out.append(current);
          }
          return out.toString();
      }   

    @Override
    public boolean handleFault(SOAPMessageContext context) {

        System.out.println("Server : handleFault()......");

        return true;
    }

    @Override
    public void close(MessageContext context) {
        System.out.println("Server : close()......");
    }

    @Override
    public Set<QName> getHeaders() {
        System.out.println("Server : getHeaders()......");
        return null;
    }

}

您需要做的是更改 XML 而不解析它,正如您已经知道的那样。这不一定是一个简单的问题,但至少更通用。

首先将您的消息作为裸字符串:

ByteArray OutputStream out = new ByteArrayOutputStream();
soapMsg.writeTo(out);
String messageAsString = new String(out.toByteArray());

然后只需在字符串上使用 stripNonValidXMLCharacters,并将结果用作 SOAPMessageContext::setMessage 的输入,在您的情况下似乎是 context.setMessage(...).

另请查看这些调整无效 XML 的更漂亮的方法:removing invalid XML characters from a string in java, and Parsing malformed/incomplete/invalid XML files