使用 InputStream 读取 url 的内容时遇到问题

Having trouble reading in content of url using InputStream

所以我 运行 下面的代码打印了“!DOCTYPE html”。如何获取 url 的内容,例如 html?

public static void main(String[] args) throws IOException {
        URL u = new URL("https://www.whitehouse.gov/");
        InputStream ins = u.openStream();
        InputStreamReader isr = new InputStreamReader(ins);
        BufferedReader websiteText = new BufferedReader(isr);
        System.out.println(websiteText.readLine());

    }

根据 java 文档 https://docs.oracle.com/javase/tutorial/networking/urls/readingURL.html:"When you run the program, you should see, scrolling by in your command window, the HTML commands and textual content from the HTML file located at "... 为什么我不明白?

您只阅读了文本的一行。 试试这个,你会看到你得到两行:

System.out.println(websiteText.readLine());
System.out.println(websiteText.readLine());

尝试循环阅读以获取所有文本。

在你的程序中,你没有放置 while 循环

   URL u = new URL("https://www.whitehouse.gov/");
    InputStream ins = u.openStream();
    InputStreamReader isr = new InputStreamReader(ins);
    BufferedReader websiteText = new BufferedReader(isr);
    String inputLine;
    while ((inputLine = websiteText.readLine()) != null){
        System.out.println(inputLine);
   }

  websiteText.close();

BufferedReader 自 Java 8 起就有一个名为 #lines() 的方法。#lines() 的 return 类型是 Stream。要阅读整个站点,您可以这样做:

String htmlText = websiteText.lines()
  .reduce("", (text, nextLine) -> text + "\n" + nextLine)
  .orElse(null);