在 java 中的字符串 xml 的节点内转义 xml 个字符

Question

我有一串 XML 数据。我需要转义节点内的值，而不是节点本身。

例如：
<node1>R&R</node1>
应该转义到：
<node1>R&R</node1>
不应转义至：
<node1>R&R</node1>

过去几天我一直在研究这个，但没有取得太大的成功。我不是 Java 专家，但以下是我尝试过但行不通的方法：

正在将字符串 xml 解析为文档。不起作用，因为节点内的数据包含无效 xml 数据。
转义所有字符。不起作用，因为接收此数据的程序不会接受此格式。
转义所有字符然后解析为文档。抛出各种错误。

如有任何帮助，我们将不胜感激。

Answer 1

问题是 <node1>R&R</node1> 不是 XML。

使用 XML 解析器无济于事。 XML解析器的目的就是过滤掉这种数据。
您可以尝试使用 different parser 来解析 "dirty" HTML。

但我认为最好的解决方案是首先获得正确的 XML：

通过使用 XML 库创建数据来修复 XML 源。（永远不要通过字符串连接来创建 XML）
如果为您提供了数据，请创建一个 XML-Schema 并坚持输入数据的有效性。

Answer 2

您可以使用正则表达式匹配来查找尖括号之间的所有字符串，然后循环 through/process 每个字符串。在此示例中，我使用 Apache Commons Lang 进行 XML 转义。

public String sanitiseXml(String xml)
{
    // Match the pattern <something>text</something>
    Pattern xmlCleanerPattern = Pattern.compile("(<[^/<>]*>)([^<>]*)(</[^<>]*>)");

    StringBuilder xmlStringBuilder = new StringBuilder();

    Matcher matcher = xmlCleanerPattern.matcher(xml);
    int lastEnd = 0;
    while (matcher.find())
    {
        // Include any non-matching text between this result and the previous result
        if (matcher.start() > lastEnd) {
            xmlStringBuilder.append(xml.substring(lastEnd, matcher.start()));
        }
        lastEnd = matcher.end();

        // Sanitise the characters inside the tags and append the sanitised version
        String cleanText = StringEscapeUtils.escapeXml10(matcher.group(2));
        xmlStringBuilder.append(matcher.group(1)).append(cleanText).append(matcher.group(3));
    }
    // Include any leftover text after the last result
    xmlStringBuilder.append(xml.substring(lastEnd));

    return xmlStringBuilder.toString();
}

这会查找 text 的匹配项，捕获标签名称和包含的文本，清理包含的文本，然后将它们放回一起。

Answer 3

您展示的不是XML。是 XPL。 XPL 的结构与 XML 类似，但允许在文本字段中使用 XML 的 "special characters"。您可以使用 XPL 实用程序轻松地将 XPL 转换为 XML。 http://hll.nu

Answer 4

我使用了 Nameless Voices 答案，但正则表达式为：

Pattern xmlCleanerPattern = Pattern.compile("(<[^<>]*>)(.*)(<\/[^<>]*>)")

我发现这更好地捕获了节点本身内的所有值

在 java 中的字符串 xml 的节点内转义 xml 个字符

Escape xml characters within nodes of string xml in java

java

xml

escaping

xml-parsing