使用 java 解析 XHTML

Question

我需要一些阅读 java 中的 URL XHTML 页面的指导：

这是我打印特定字符串的最佳尝试：

    try {       
    URL item = new URL("url");
                URLConnection connect = item.openConnection();
                DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
                DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
                Document doc= dBuilder.parse(connect.getInputStream());
                doc.getDocumentElement().normalize();
                NodeList nList = doc.getElementsByTagName("tag");
                for(int temp = 0; temp<nList.getLength(); temp++) {
                    Node nNode = nList.item(temp);
                    if(nNode.getNodeType() == Node.ELEMENT_NODE) {
                        Element el = (Element) nNode;
        System.out.println((el.getElementsByTagName("wantedElement").item(0).getTextContent()));
                    }}
}catch(IOException | ParserConfigurationException | SAXException e) {
            e.printStackTrace();
            }

来自 Eclipse 的响应：

 [Fatal Error] :1:1: Content is not allowed in prolog.
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.

我正在尝试解析的 XHTML 示例（来自 TD Ameritrade API）：

<CandleList>
<candles>
<candles>
<open>45.97</open>
<high>46.26</high>
<low>45.8</low>
<close>46.0</close>
<volume>7176781</volume>
<datetime>1496293200000</datetime>
</candles>
<candles>
<open>46.22</open>
<high>46.86</high>
<low>45.9</low>
<close>46.8</close>
<volume>9523927</volume>
<datetime>1496379600000</datetime>
</candles>

感谢任何帮助！

Answer 1

虽然问题包含评论中提到的所有问题，但第 1 行第 1 列的错误是关于流开头的 BOM。

一些服务，尤其是.Net服务在流的开头发送BOM来标记编码，如UTF-8、UTF-16LE等

Byte order mark screws up file reading in Java

使用 java 解析 XHTML

Parsing XHTML with java

java

xml

xhtml

parsing

dom