解析 XML 个带有文本的自闭合标签
Parse XML Self-Closing Tags with Text
大家好,我正在尝试解析我拥有的 XML 文件的这一部分。我遇到的问题是文本包含很多自关闭标签。我无法删除这些标签,因为它们为我提供了一些索引详细信息。
我怎样才能访问没有所有 "Node" 标签的文本?
<TextWithNodes>
<Node id="0"/>A TEENAGER <Node
id="11"/>yesterday<Node id="20"/> accused his parents of cruelty
by feeding him a daily diet of chips which sent his weight
ballooning to 22st at the age of l2<Node id="146"/>.<Node
id="147"/>
</TextWithNodes>
使用 XML 解析器库,例如 Jsoup。 https://jsoup.org/
此问题的答案中提供了操作方法:
How to parse XML with jsoup
虽然奇怪,但这个 XML 实际上是格式正确的,可以用普通的 XML 工具解析。 TextWithNodes
元素只有混合内容。
TextWithNodes
的字符串值可以通过简单的 XPath 获得,
string(/TextWithNodes)
产生你想要的文本,没有其他标记(自关闭或其他):
A TEENAGER yesterday accused his parents of cruelty
by feeding him a daily diet of chips which sent his weight
ballooning to 22st at the age of l2.
下面是一些示例代码,使用在 Java 中使用 XPATH 的想法作为答案 (感谢@kjhughes):
public static void main(String[] args) throws IOException, ParserConfigurationException, SAXException, XPathExpressionException {
String text = "<TextWithNodes>\n" +
" <Node id=\"0\"/>A TEENAGER <Node\n" +
"id=\"11\"/>yesterday<Node id=\"20\"/> accused his parents of cruelty\n" +
"by feeding him a daily diet of chips which sent his weight\n" +
"ballooning to 22st at the age of l2<Node id=\"146\"/>.<Node\n" +
"id=\"147\"/>\n" +
"</TextWithNodes>";
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(new ByteArrayInputStream(text.getBytes("UTF-8")));
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "//TextWithNodes";
System.out.println(xPath.compile(expression).evaluate(xmlDocument, XPathConstants.STRING));
}
打印出来:
A TEENAGER yesterday accused his parents of cruelty
by feeding him a daily diet of chips which sent his weight
ballooning to 22st at the age of l2.
大家好,我正在尝试解析我拥有的 XML 文件的这一部分。我遇到的问题是文本包含很多自关闭标签。我无法删除这些标签,因为它们为我提供了一些索引详细信息。 我怎样才能访问没有所有 "Node" 标签的文本?
<TextWithNodes>
<Node id="0"/>A TEENAGER <Node
id="11"/>yesterday<Node id="20"/> accused his parents of cruelty
by feeding him a daily diet of chips which sent his weight
ballooning to 22st at the age of l2<Node id="146"/>.<Node
id="147"/>
</TextWithNodes>
使用 XML 解析器库,例如 Jsoup。 https://jsoup.org/
此问题的答案中提供了操作方法: How to parse XML with jsoup
虽然奇怪,但这个 XML 实际上是格式正确的,可以用普通的 XML 工具解析。 TextWithNodes
元素只有混合内容。
TextWithNodes
的字符串值可以通过简单的 XPath 获得,
string(/TextWithNodes)
产生你想要的文本,没有其他标记(自关闭或其他):
A TEENAGER yesterday accused his parents of cruelty
by feeding him a daily diet of chips which sent his weight
ballooning to 22st at the age of l2.
下面是一些示例代码,使用在 Java 中使用 XPATH 的想法作为答案
public static void main(String[] args) throws IOException, ParserConfigurationException, SAXException, XPathExpressionException {
String text = "<TextWithNodes>\n" +
" <Node id=\"0\"/>A TEENAGER <Node\n" +
"id=\"11\"/>yesterday<Node id=\"20\"/> accused his parents of cruelty\n" +
"by feeding him a daily diet of chips which sent his weight\n" +
"ballooning to 22st at the age of l2<Node id=\"146\"/>.<Node\n" +
"id=\"147\"/>\n" +
"</TextWithNodes>";
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(new ByteArrayInputStream(text.getBytes("UTF-8")));
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "//TextWithNodes";
System.out.println(xPath.compile(expression).evaluate(xmlDocument, XPathConstants.STRING));
}
打印出来:
A TEENAGER yesterday accused his parents of cruelty
by feeding him a daily diet of chips which sent his weight
ballooning to 22st at the age of l2.