从 URL 中检索 XML 不写前几行

Retrieving XML from URL not writing first couple of lines

我目前正在为大学编写一个基本的天气应用程序,其中包括从 BBC 天气 RSS 提要中检索天气信息。

我已将其全部设置为将 RSS 提要输出到文件 (output.xml),然后解析器 class 将使用该文件构建树。

但是我得到 The markup in the document following the root element must be well- formed. 当我 运行 它时出错。

检查下载的 XML 文件后,我发现前两个节点丢失了。

这是下载的 XML:

<channel>
    <atom:link href="http://open.live.bbc.co.uk/weather/feeds/en/2656397/observations.rss" rel="self" type="application/rss+xml" />
    <title>BBC Weather - Observations for  Bangor, United Kingdom</title>
    <link>http://www.bbc.co.uk/weather/2656397</link>
    <description>Latest observations for Bangor from BBC Weather, including weather, temperature and wind information</description>
    <language>en</language>
    <copyright>Copyright: (C) British Broadcasting Corporation, see http://www.bbc.co.uk/terms/additional_rss.shtml for more details</copyright>
    <pubDate>Thu, 12 Mar 2015 05:35:08 +0000</pubDate>
    <item>
      <title>Thursday - 05:00 GMT: Thick Cloud, 10°C (50°F)</title>
      <link>http://www.bbc.co.uk/weather/2656397</link>
      <description>Temperature: 10°C (50°F), Wind Direction: South Easterly, Wind Speed: 8mph, Humidity: 90%, Pressure: 1021mb, Falling, Visibility: Very Good</description>
      <pubDate>Thu, 12 Mar 2015 05:35:08 +0000</pubDate>
      <guid isPermaLink="false">http://www.bbc.co.uk/weather/2656397-2015-03-12T05:35:08.000Z</guid>
      <georss:point>53.22647 -4.13459</georss:point>
    </item>
  </channel>
</rss>

XML在<channel>节点之前应该有如下两个节点:

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:georss="http://www.georss.org/georss" version="2.0">

这是我用来检索 XML 文件的代码:

public static void main(String[] args) throws SAXException, IOException, XPathExpressionException {
    URL url = new URL("http://open.live.bbc.co.uk/weather/feeds/en/2656397/observations.rss");
    URLConnection con = url.openConnection();
    StringBuilder builder;
    try (BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()))) {

        builder = new StringBuilder();
        String line;

        if (!in.readLine().isEmpty()) {
            line = in.readLine();
        }

        while ((line = in.readLine()) != null) {
            builder.append(line).append("\n");
        }

        String input = builder.toString();

        BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File("output.xml"))));
        out.write(input);
        out.flush();
    }
    try {
        WeatherParser parser = new WeatherParser();
        System.out.println(parser.parse("output.xml"));
    } catch (ParserConfigurationException ex) {
    }
}

下面是解析 XML (WeatherParser.java) 的代码:

public class WeatherParser {

    public WeatherParser() throws ParserConfigurationException {
        xpfactory = XPathFactory.newInstance();
        path = xpfactory.newXPath();
        dbfactory = DocumentBuilderFactory.newInstance();
        builder = dbfactory.newDocumentBuilder();
    }

    public String parse(String fileName) throws SAXException, IOException, XPathExpressionException {
        File f = new File(fileName);
        org.w3c.dom.Document doc = builder.parse(f);
        StringBuilder info = new StringBuilder();
        info.append(path.evaluate("/channel/item/title", doc));
        return info.toString();
    }

    private DocumentBuilderFactory dbfactory;
    private DocumentBuilder builder;
    private XPathFactory xpfactory;
    private XPath path;
}

希望提供的信息足够了。

缺少前两行是因为您阅读了但没有阅读"save"
删除它,它将起作用。

    if (!in.readLine().isEmpty()) {
        line = in.readLine();
    }

if 中你正在阅读第一行 (<?xml....) 但你没有保留它。
line = in.readLine(); 获得第二个,但是当您输入 while 时,您会丢失 line 变量中的内容。

首先,您不得操纵服务器发送给您的数据流。删除 StringBuilder。如果要将 XML 保存到磁盘,请逐字写入:

URL url = new URL("http://open.live.bbc.co.uk/weather/feeds/en/2656397/observations.rss");
URLConnection con = url.openConnection();
InputStream in = conn.getInputStream();
FileOutputStream out = new FileOutputStream("output.xml");

byte[] b = new byte[1024];
int count;
while ((count = in.read(b)) >= 0) {
    out.write(b, 0, count);
}
out.flush(); out.close(); in.close();

事实上,您根本不需要将其写入磁盘。您可以直接从输入流构建 XML 文档。

public static Document readXml(InputStream is) throws SAXException, ParserConfigurationException, IOException {
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

    dbf.setValidating(false);
    dbf.setIgnoringComments(false);
    dbf.setIgnoringElementContentWhitespace(true);
    dbf.setNamespaceAware(true);

    DocumentBuilder db = dbf.newDocumentBuilder();
    return db.parse(is);
}

使您能够

public static void main (String[] args) throws java.lang.Exception
{
    URL observationsUrl = new URL("http://open.live.bbc.co.uk/weather/feeds/en/2656397/observations.rss");
    Document observations = readXml(observationsUrl.openConnection().getInputStream());

    XPathFactory xpf = XPathFactory.newInstance();
    XPath xpath = xpf.newXPath();

    String title = xpath.evaluate("/rss/channel/title", observations);
    System.out.println(title);

    XPathExpression rssitemsExpr = xpath.compile("/rss/channel/item");

    NodeList items = (NodeList)rssitemsExpr.evaluate(observations, XPathConstants.NODESET);
    for (int i = 0; i < items.getLength(); i++) {
        System.out.println(xpath.evaluate("./title", items.item(i)));
    }
}

我的输出:

BBC Weather - Observations for  Bangor, United Kingdom
Thursday - 06:00 GMT: Thick Cloud, 11°C (52°F)