使用 java 解析 XHTML
Parsing XHTML with java
我需要一些阅读 java 中的 URL XHTML 页面的指导:
这是我打印特定字符串的最佳尝试:
try {
URL item = new URL("url");
URLConnection connect = item.openConnection();
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc= dBuilder.parse(connect.getInputStream());
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("tag");
for(int temp = 0; temp<nList.getLength(); temp++) {
Node nNode = nList.item(temp);
if(nNode.getNodeType() == Node.ELEMENT_NODE) {
Element el = (Element) nNode;
System.out.println((el.getElementsByTagName("wantedElement").item(0).getTextContent()));
}}
}catch(IOException | ParserConfigurationException | SAXException e) {
e.printStackTrace();
}
来自 Eclipse 的响应:
[Fatal Error] :1:1: Content is not allowed in prolog.
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
我正在尝试解析的 XHTML 示例(来自 TD Ameritrade API):
<CandleList>
<candles>
<candles>
<open>45.97</open>
<high>46.26</high>
<low>45.8</low>
<close>46.0</close>
<volume>7176781</volume>
<datetime>1496293200000</datetime>
</candles>
<candles>
<open>46.22</open>
<high>46.86</high>
<low>45.9</low>
<close>46.8</close>
<volume>9523927</volume>
<datetime>1496379600000</datetime>
</candles>
感谢任何帮助!
虽然问题包含评论中提到的所有问题,但第 1 行第 1 列的错误是关于流开头的 BOM。
一些服务,尤其是.Net服务在流的开头发送BOM来标记编码,如UTF-8、UTF-16LE等
Byte order mark screws up file reading in Java
我需要一些阅读 java 中的 URL XHTML 页面的指导:
这是我打印特定字符串的最佳尝试:
try {
URL item = new URL("url");
URLConnection connect = item.openConnection();
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc= dBuilder.parse(connect.getInputStream());
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("tag");
for(int temp = 0; temp<nList.getLength(); temp++) {
Node nNode = nList.item(temp);
if(nNode.getNodeType() == Node.ELEMENT_NODE) {
Element el = (Element) nNode;
System.out.println((el.getElementsByTagName("wantedElement").item(0).getTextContent()));
}}
}catch(IOException | ParserConfigurationException | SAXException e) {
e.printStackTrace();
}
来自 Eclipse 的响应:
[Fatal Error] :1:1: Content is not allowed in prolog.
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
我正在尝试解析的 XHTML 示例(来自 TD Ameritrade API):
<CandleList>
<candles>
<candles>
<open>45.97</open>
<high>46.26</high>
<low>45.8</low>
<close>46.0</close>
<volume>7176781</volume>
<datetime>1496293200000</datetime>
</candles>
<candles>
<open>46.22</open>
<high>46.86</high>
<low>45.9</low>
<close>46.8</close>
<volume>9523927</volume>
<datetime>1496379600000</datetime>
</candles>
感谢任何帮助!
虽然问题包含评论中提到的所有问题,但第 1 行第 1 列的错误是关于流开头的 BOM。
一些服务,尤其是.Net服务在流的开头发送BOM来标记编码,如UTF-8、UTF-16LE等
Byte order mark screws up file reading in Java