使用 InputStream 读取 url 的内容时遇到问题

Question

所以我运行下面的代码打印了“!DOCTYPE html”。如何获取 url 的内容，例如 html？

public static void main(String[] args) throws IOException {
        URL u = new URL("https://www.whitehouse.gov/");
        InputStream ins = u.openStream();
        InputStreamReader isr = new InputStreamReader(ins);
        BufferedReader websiteText = new BufferedReader(isr);
        System.out.println(websiteText.readLine());

    }

根据 java 文档 https://docs.oracle.com/javase/tutorial/networking/urls/readingURL.html："When you run the program, you should see, scrolling by in your command window, the HTML commands and textual content from the HTML file located at "... 为什么我不明白？

Answer 1

您只阅读了文本的一行。试试这个，你会看到你得到两行：

System.out.println(websiteText.readLine());
System.out.println(websiteText.readLine());

尝试循环阅读以获取所有文本。

Answer 2

在你的程序中，你没有放置 while 循环。

   URL u = new URL("https://www.whitehouse.gov/");
    InputStream ins = u.openStream();
    InputStreamReader isr = new InputStreamReader(ins);
    BufferedReader websiteText = new BufferedReader(isr);
    String inputLine;
    while ((inputLine = websiteText.readLine()) != null){
        System.out.println(inputLine);
   }

  websiteText.close();

Answer 3

BufferedReader 自 Java 8 起就有一个名为 #lines() 的方法。#lines() 的 return 类型是 Stream。要阅读整个站点，您可以这样做：

String htmlText = websiteText.lines()
  .reduce("", (text, nextLine) -> text + "\n" + nextLine)
  .orElse(null);

使用 InputStream 读取 url 的内容时遇到问题

Having trouble reading in content of url using InputStream

java

io

inputstream

bufferedreader

inputstreamreader