如何在 JAVA 中使用 HTTP 请求通过 TCP 套接字获取网页

Question

我的大学作业是通过 URL 使用 TCP 套接字和 HTTP GET 请求从任何 Web 服务器获取网页。

我没有从任何服务器收到 HTTP/1.0 200 OK 响应。

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.PrintStream;
import java.net.InetAddress;
import java.net.Socket;
import java.net.URL;
import java.util.Scanner;
import java.net.*;
public class DCCN042 {

    public static void main(String[] args) {
            Scanner inpt = new Scanner(System.in);
                System.out.print("Enter URL: ");
                String url = inpt.next();
                TCPConnect(url); 
            }
   public static void TCPConnect(String url) {
        try {
            String hostname = new URL(url).getHost();
            System.out.println("Loading contents of Server: " + hostname);
            InetAddress ia = InetAddress.getByName(hostname);
            String ip = ia.getHostAddress();
            System.out.println(ip + " is IP Adress for  " + hostname);
            String path = new URL(url).getPath();
            System.out.println("Requested Path on the server: " + path);
            Socket socket = new Socket(ip, 80);
            // Create input and output streams to read from and write to the server
            PrintStream out = new PrintStream(socket.getOutputStream());
            BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
            // Follow the HTTP protocol of GET <path> HTTP/1.0 followed by an empty line
            if (hostname ! = url) {
                //Request Line
                out.println("GET " + path + " HTTP/1.1");
                out.println("Host: " + hostname);
                //Header Lines
                out.println("User-Agent: Java/13.0.2");
                out.println("Accept-Language: en-us");
                out.println("Accept: */*");
                out.println("Connection: keep-alive");
                out.println("Accept-Encoding: gzip, deflate, br");
                // Blank Line
                out.println();
            } else {
                //Request Line
                out.println("GET / HTTP/1.0");
                out.println("Host: " + hostname);
                //Header Lines
                out.println("User-Agent: Java/13.0.2");
                out.println("Accept-Language: en-us");
                out.println("Accept: */*");
                out.println("Connection: keep-alive");
                out.println("Accept-Encoding: gzip, deflate, br");
                // Blank Line
                out.println();
            }
            // Read data from the server until we finish reading the document
            String line = in.readLine();
            while (line != null) {
                System.out.println(line);
                line = in.readLine();
            }
            // Close our streams
            in.close();
            out.close();
            socket.close();
        } catch (Exception e) {
            System.out.println("Invalid URl");
            e.printStackTrace();
        }
    }
}

我创建了一个 TCP 套接字并将我从 InetAddress.getHostAddress() 和端口 80 接收到的 IP 地址传递给 Web 服务器，并使用 getPath() 和 getHost() 来将路径和主机名与 URL 分开，并在 HTTP GET 请求中使用相同的路径和主机名。

服务器响应：

Enter URL: 
    Loading contents of Server: whosebug.com
    151.101.65.69 is IP Adress for  whosebug.com
    Requested Path on the server: /questions/33015868/java-simple-http-get-request-using-tcp-sockets
    HTTP/1.1 301 Moved Permanently
    cache-control: no-cache, no-store, must-revalidate
    location: 
    x-request-guid: 5f2af765-40c2-49ca-b9a1-daa321373682
    feature-policy: microphone 'none'; speaker 'none'
    content-security-policy: upgrade-insecure-requests; frame-ancestors 'self' https://stackexchange.com
    Accept-Ranges: bytes
    Transfer-Encoding: chunked
    Date: Mon, 27 Dec 2021 15:00:17 GMT
    Via: 1.1 varnish
    Connection: keep-alive
    X-Served-By: cache-qpg1263-QPG
    X-Cache: MISS
    X-Cache-Hits: 0
    X-Timer: S1640617217.166650,VS0,VE338
    Vary: Fastly-SSL
    X-DNS-Prefetch-Control: off
    Set-Cookie: prov=149aa0ef-a3a6-8001-17c1-128d6d4b7273; domain=.whosebug.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
    
    0

我的要求是获取此网页的HTML源代码，以及HTTP/1.0 200 OK响应。

Answer 1

发生这种情况是因为您使用的是带有硬编码端口 80 的普通 Socket。这意味着，独立于在您的输入中使用 http 或 https url，您正在通过不安全协议 http.

进行请求

在这种情况下，服务器会告诉您，正如 Samuel L. Jackson 会说的那样“嘿 mf！您正试图通过一种不安全的协议 HTTP 联系我。使用安全的 mf ，f HTTPS.”，因此，它以 301 响应（这只是意味着“使用此 url，而不是原始的”），以及 Location header 指向正确的 URL，https。

显然 301 Location 是相同的 URL，但事实并非如此，因为在您的代码中您是硬编码 http，并且服务器响应正在重定向至 https.

要让您的代码与 https 一起工作，而不是普通的 Socket，请使用：

SSLSocketFactory factory = (SSLSocketFactory)SSLSocketFactory.getDefault();
SSLSocket socket = (SSLSocket)factory.createSocket(ia, 443);

请注意 我没有使用 ip，因为对于 https，您需要证书对应于域，如果您使用您将获得 CertificateExpiredException.

的 IP

现在，您必须根据用户输入以编程方式管理是使用 Socket 还是 SSLSocket。

如何在 JAVA 中使用 HTTP 请求通过 TCP 套接字获取网页

How to Fetch Webpage Through TCP socket using HTTP Request in JAVA

html

java

sockets

http

tcpclient