如何在 JAVA 中使用 HTTP 请求通过 TCP 套接字获取网页
How to Fetch Webpage Through TCP socket using HTTP Request in JAVA
我的大学作业是通过 URL 使用 TCP 套接字和 HTTP GET
请求从任何 Web 服务器获取网页。
我没有从任何服务器收到 HTTP/1.0 200 OK
响应。
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.PrintStream;
import java.net.InetAddress;
import java.net.Socket;
import java.net.URL;
import java.util.Scanner;
import java.net.*;
public class DCCN042 {
public static void main(String[] args) {
Scanner inpt = new Scanner(System.in);
System.out.print("Enter URL: ");
String url = inpt.next();
TCPConnect(url);
}
public static void TCPConnect(String url) {
try {
String hostname = new URL(url).getHost();
System.out.println("Loading contents of Server: " + hostname);
InetAddress ia = InetAddress.getByName(hostname);
String ip = ia.getHostAddress();
System.out.println(ip + " is IP Adress for " + hostname);
String path = new URL(url).getPath();
System.out.println("Requested Path on the server: " + path);
Socket socket = new Socket(ip, 80);
// Create input and output streams to read from and write to the server
PrintStream out = new PrintStream(socket.getOutputStream());
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
// Follow the HTTP protocol of GET <path> HTTP/1.0 followed by an empty line
if (hostname ! = url) {
//Request Line
out.println("GET " + path + " HTTP/1.1");
out.println("Host: " + hostname);
//Header Lines
out.println("User-Agent: Java/13.0.2");
out.println("Accept-Language: en-us");
out.println("Accept: */*");
out.println("Connection: keep-alive");
out.println("Accept-Encoding: gzip, deflate, br");
// Blank Line
out.println();
} else {
//Request Line
out.println("GET / HTTP/1.0");
out.println("Host: " + hostname);
//Header Lines
out.println("User-Agent: Java/13.0.2");
out.println("Accept-Language: en-us");
out.println("Accept: */*");
out.println("Connection: keep-alive");
out.println("Accept-Encoding: gzip, deflate, br");
// Blank Line
out.println();
}
// Read data from the server until we finish reading the document
String line = in.readLine();
while (line != null) {
System.out.println(line);
line = in.readLine();
}
// Close our streams
in.close();
out.close();
socket.close();
} catch (Exception e) {
System.out.println("Invalid URl");
e.printStackTrace();
}
}
}
我创建了一个 TCP 套接字并将我从 InetAddress.getHostAddress()
和端口 80
接收到的 IP 地址传递给 Web 服务器,并使用 getPath()
和 getHost()
来将路径和主机名与 URL 分开,并在 HTTP GET
请求中使用相同的路径和主机名。
服务器响应:
Enter URL:
Loading contents of Server: whosebug.com
151.101.65.69 is IP Adress for whosebug.com
Requested Path on the server: /questions/33015868/java-simple-http-get-request-using-tcp-sockets
HTTP/1.1 301 Moved Permanently
cache-control: no-cache, no-store, must-revalidate
location:
x-request-guid: 5f2af765-40c2-49ca-b9a1-daa321373682
feature-policy: microphone 'none'; speaker 'none'
content-security-policy: upgrade-insecure-requests; frame-ancestors 'self' https://stackexchange.com
Accept-Ranges: bytes
Transfer-Encoding: chunked
Date: Mon, 27 Dec 2021 15:00:17 GMT
Via: 1.1 varnish
Connection: keep-alive
X-Served-By: cache-qpg1263-QPG
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1640617217.166650,VS0,VE338
Vary: Fastly-SSL
X-DNS-Prefetch-Control: off
Set-Cookie: prov=149aa0ef-a3a6-8001-17c1-128d6d4b7273; domain=.whosebug.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
0
我的要求是获取此网页的HTML源代码,以及HTTP/1.0 200 OK
响应。
发生这种情况是因为您使用的是带有硬编码端口 80
的普通 Socket
。这意味着,独立于在您的输入中使用 http
或 https
url,您正在通过不安全协议 http
.
进行请求
在这种情况下,服务器会告诉您,正如 Samuel L. Jackson 会说的那样“嘿 mf!您正试图通过一种不安全的协议 HTTP 联系我。使用安全的 mf ,f HTTPS.”,因此,它以 301 响应(这只是意味着“使用此 url,而不是原始的”),以及 Location
header 指向正确的 URL,https
。
显然 301
Location
是相同的 URL,但事实并非如此,因为在您的代码中您是硬编码 http
,并且服务器响应正在重定向至 https
.
要让您的代码与 https
一起工作,而不是普通的 Socket
,请使用:
SSLSocketFactory factory = (SSLSocketFactory)SSLSocketFactory.getDefault();
SSLSocket socket = (SSLSocket)factory.createSocket(ia, 443);
请注意 我没有使用 ip
,因为对于 https
,您需要证书对应于域,如果您使用您将获得 CertificateExpiredException
.
的 IP
现在,您必须根据用户输入以编程方式管理是使用 Socket
还是 SSLSocket
。
我的大学作业是通过 URL 使用 TCP 套接字和 HTTP GET
请求从任何 Web 服务器获取网页。
我没有从任何服务器收到 HTTP/1.0 200 OK
响应。
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.PrintStream;
import java.net.InetAddress;
import java.net.Socket;
import java.net.URL;
import java.util.Scanner;
import java.net.*;
public class DCCN042 {
public static void main(String[] args) {
Scanner inpt = new Scanner(System.in);
System.out.print("Enter URL: ");
String url = inpt.next();
TCPConnect(url);
}
public static void TCPConnect(String url) {
try {
String hostname = new URL(url).getHost();
System.out.println("Loading contents of Server: " + hostname);
InetAddress ia = InetAddress.getByName(hostname);
String ip = ia.getHostAddress();
System.out.println(ip + " is IP Adress for " + hostname);
String path = new URL(url).getPath();
System.out.println("Requested Path on the server: " + path);
Socket socket = new Socket(ip, 80);
// Create input and output streams to read from and write to the server
PrintStream out = new PrintStream(socket.getOutputStream());
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
// Follow the HTTP protocol of GET <path> HTTP/1.0 followed by an empty line
if (hostname ! = url) {
//Request Line
out.println("GET " + path + " HTTP/1.1");
out.println("Host: " + hostname);
//Header Lines
out.println("User-Agent: Java/13.0.2");
out.println("Accept-Language: en-us");
out.println("Accept: */*");
out.println("Connection: keep-alive");
out.println("Accept-Encoding: gzip, deflate, br");
// Blank Line
out.println();
} else {
//Request Line
out.println("GET / HTTP/1.0");
out.println("Host: " + hostname);
//Header Lines
out.println("User-Agent: Java/13.0.2");
out.println("Accept-Language: en-us");
out.println("Accept: */*");
out.println("Connection: keep-alive");
out.println("Accept-Encoding: gzip, deflate, br");
// Blank Line
out.println();
}
// Read data from the server until we finish reading the document
String line = in.readLine();
while (line != null) {
System.out.println(line);
line = in.readLine();
}
// Close our streams
in.close();
out.close();
socket.close();
} catch (Exception e) {
System.out.println("Invalid URl");
e.printStackTrace();
}
}
}
我创建了一个 TCP 套接字并将我从 InetAddress.getHostAddress()
和端口 80
接收到的 IP 地址传递给 Web 服务器,并使用 getPath()
和 getHost()
来将路径和主机名与 URL 分开,并在 HTTP GET
请求中使用相同的路径和主机名。
服务器响应:
Enter URL:
Loading contents of Server: whosebug.com
151.101.65.69 is IP Adress for whosebug.com
Requested Path on the server: /questions/33015868/java-simple-http-get-request-using-tcp-sockets
HTTP/1.1 301 Moved Permanently
cache-control: no-cache, no-store, must-revalidate
location:
x-request-guid: 5f2af765-40c2-49ca-b9a1-daa321373682
feature-policy: microphone 'none'; speaker 'none'
content-security-policy: upgrade-insecure-requests; frame-ancestors 'self' https://stackexchange.com
Accept-Ranges: bytes
Transfer-Encoding: chunked
Date: Mon, 27 Dec 2021 15:00:17 GMT
Via: 1.1 varnish
Connection: keep-alive
X-Served-By: cache-qpg1263-QPG
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1640617217.166650,VS0,VE338
Vary: Fastly-SSL
X-DNS-Prefetch-Control: off
Set-Cookie: prov=149aa0ef-a3a6-8001-17c1-128d6d4b7273; domain=.whosebug.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
0
我的要求是获取此网页的HTML源代码,以及HTTP/1.0 200 OK
响应。
发生这种情况是因为您使用的是带有硬编码端口 80
的普通 Socket
。这意味着,独立于在您的输入中使用 http
或 https
url,您正在通过不安全协议 http
.
在这种情况下,服务器会告诉您,正如 Samuel L. Jackson 会说的那样“嘿 mf!您正试图通过一种不安全的协议 HTTP 联系我。使用安全的 mf ,f HTTPS.”,因此,它以 301 响应(这只是意味着“使用此 url,而不是原始的”),以及 Location
header 指向正确的 URL,https
。
显然 301
Location
是相同的 URL,但事实并非如此,因为在您的代码中您是硬编码 http
,并且服务器响应正在重定向至 https
.
要让您的代码与 https
一起工作,而不是普通的 Socket
,请使用:
SSLSocketFactory factory = (SSLSocketFactory)SSLSocketFactory.getDefault();
SSLSocket socket = (SSLSocket)factory.createSocket(ia, 443);
请注意 我没有使用 ip
,因为对于 https
,您需要证书对应于域,如果您使用您将获得 CertificateExpiredException
.
现在,您必须根据用户输入以编程方式管理是使用 Socket
还是 SSLSocket
。