将抓取的数据保存到文件

Saving scraped data to file

我使用 Jsoup 从多个网页抓取数据,我怎样才能将抓取的数据保存到文件而不覆盖之前抓取的网页

我尝试在堆栈溢出和 Jsoup 文档中搜索解决方案。

        int j = 0;
        int i = 0;
        String URL = ("https://www.ufc.com/athletes/all?gender=All&search=&page="+j);
        Document doc = Jsoup.connect(URL).userAgent("mozilla/70.0.1").get();
        Elements temp = doc.select("div.c-listing-athlete__text");



        for (Element fighterList:temp) {
            i++;
            System.out.println(i + " " + fighterList.getElementsByClass("c-listing-athlete__name").first().text());
        }



        j++;
        URL = ("https://www.ufc.com/athletes/all?gender=All&search=&page="+j);
        doc = Jsoup.connect(URL).userAgent("mozilla/70.0.1").get();
        temp = doc.select("div.c-listing-athlete__text");

        for (Element fighterList:temp) {
            i++;
            System.out.println(i + " " + fighterList.getElementsByClass("c-listing-athlete__name").first().text());
        }

如果您需要从代码中保存数据,只需检查一下,也许它可以帮助您:

int i = 0;
int pagesNumber = 10;
String URL = "";
Document doc = null;
Elements temp = null;

try {

    // Create file 
    FileWriter fstream = new FileWriter(System.currentTimeMillis() + "out.txt");
    BufferedWriter out = new BufferedWriter(fstream);

    for (i=0; i<pagesNumber; i++) {

        URL = ("https://www.ufc.com/athletes/all?gender=All&search=&page="+i);
        doc = Jsoup.connect(URL).userAgent("mozilla/70.0.1").get();
        temp = doc.select("div.c-listing-athlete__text");

        for (Element fighter : temp) {
            out.write(i + " " + fighter.getElementsByClass("c-listing-athlete__name").first().text());
        }
    }

    //Close the output stream
    out.close();

} catch (Exception e) { // Catch exception if any
    System.err.println("Error: " + e.getMessage());
}

希望对您有所帮助:)