为什么我不能用 C 中的 Winsock2 程序获取 BBC 服务器的主网页?

Why don't i get the main web page of BBC server with Winsock2 program in C?

我编写了一个程序,用于打印 BBC 服务器的主页。 BBC 服务器主机名为 www.bbc.co.uk,其 IP 地址为 38.160.150.31。当我向服务器发送 HTTP GET 命令消息时,我没有得到 BBC 的主页,而是得到以下内容:

HTTP/1.1 500 Internal Server Error
Cache-Control: no-cache
Pragma: no-cache
Content-Type: text/html; charset=utf-8
Connection: close
Content-Length: 685

<HTML><HEAD>
<TITLE>Appliance Error</TITLE>
</HEAD>
<BODY>
<FONT face="Helvetica">
<big><strong></strong></big><BR>
</FONT>
<blockquote>
<TABLE border=0 cellPadding=1 width="80%">
<TR><TD>
<FONT face="Helvetica">
<big>Appliance Error (internal_error)</big>
<BR>
<BR>
</FONT>
</TD></TR>
<TR><TD>
<FONT face="Helvetica">
An unrecoverable error was encountered: ""
</FONT>
</TD></TR>
<TR><TD>
<FONT face="Helvetica">
This problem is unexpected. Please use the contact information below to obtain assistance.
</FONT>
</TD></TR>
<TR><TD>
<FONT face="Helvetica" SIZE=2>
<BR>
For assistance, contact your network support team.
</FONT>
</TD></TR>
</TABLE>
</blockquote>
</FONT>
</BODY></HTML>

我的代码:

#include <stdio.h>
#include <stdlib.h>
#include <winsock2.h>
#include <string.h>

int main()
{
    WSADATA wsaData;
    if (WSAStartup(MAKEWORD(2,2), &wsaData) != 0) {
        puts("Error: Cannot initialize winsock.");
        return 0;
    }
    SOCKET mainSocket = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
    if (mainSocket == INVALID_SOCKET || mainSocket == SOCKET_ERROR) {
        puts("Error: Cannot create socket.");
        return 0;
    }

    SOCKADDR_IN hostAddress;
    hostAddress.sin_family = AF_INET;
    hostAddress.sin_port = htons(80);
    hostAddress.sin_addr.S_un.S_addr = inet_addr("38.160.150.31");

    if (connect(mainSocket, (SOCKADDR*) &hostAddress, sizeof(hostAddress)) == SOCKET_ERROR) {
        printf("Cannot connect to the server. Error Code: %d\n", WSAGetLastError());
        return 0;
    }
    puts("Connected!");

    char *message = "GET HTTP/1.1\r\nHost: www.bbc.co.uk\r\n\r\n";
    int retval = send(mainSocket, message, strlen(message), 0);
    if (retval == 0) {
        puts("Error: Connection lost.");
        return 0;
    } else if (retval < 0) {
        printf("Error: Cannot send any message. Err #%d\n", WSAGetLastError());
        return 0;
    }
    char *serverReply = (char*) malloc(sizeof(char)*1000);
    if (serverReply == NULL) {
        puts("Error: Out of memory.");
        return 0;
    }
    puts("Recieved:");
    while (1) {
        retval = recv(mainSocket, serverReply, 999, 0);
        if (retval <= 0) break;
        serverReply[retval] = '[=11=]';
        printf("%s", serverReply);
    }
    closesocket(mainSocket);
    puts("\nConnection closed.");
    WSACleanup();
    free(serverReply);
    return 1;
}

我的代码有什么问题?

在行

char *message = "GET HTTP/1.1\r\nHost: www.bbc.co.uk\r\n\r\n";

缺少请求目标。

3.1.1. Request Line

A request-line begins with a method token, followed by a single space (SP), the request-target, another single space (SP), the protocol version, and ends with CRLF.

request-line = method SP request-target SP HTTP-version CRLF

https://www.rfc-editor.org/rfc/rfc7230#section-3.1.1

因此代码行应该如下所示:

char* message = "GET / HTTP/1.1\r\nHost: www.bbc.co.uk\r\n\r\n";

注意 GET 之后的 /

旁注

nslookup www.bbc.co.uk returns 给我一个不同的 IP 地址。据推测,某些用户的 IP 地址可能不同,这可能取决于他们的地理位置或负载平衡系统等。

程序执行时,returns一个HTTP状态码301,意思是

The HyperText Transfer Protocol (HTTP) 301 Moved Permanently redirect status response code indicates that the resource requested has been definitively moved to the URL given by the Location headers.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/301

位置 header 是:

Location: https://www.bbc.co.uk/

请注意 https 协议。

所以要获取BBC网站的内容,需要进行https请求。你可能想为此使用一个库,例如这个很好的答案:https://whosebug.com/a/16255486/2331445