使用 libcurl 下载 UTF-8 文件（ANSI 工作正常）

Question

我正在 libcurl 的帮助下编写一个简单的文件下载器。这是从 HTTP 服务器下载文件的代码：

static size_t WriteCallback(void *contents, size_t size, size_t nmemb, void *userp) {
    ((std::string*)userp)->append((char*)contents, size * nmemb);
    return size * nmemb;
}

std::wstring result; //result with polish letters (ą, ę etc.)
CURL *curl;
CURLcode res;
std::string readBuffer;

curl = curl_easy_init();
ERROR_HANDLE(curl, L"CURL could not been inited.", MOD_INTERNET);
curl_easy_setopt(curl, CURLOPT_URL, url.c_str());
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 0L);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 0L);
curl_easy_setopt(curl, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_easy_setopt(curl, CURLOPT_USERPWD, (login + ":" + password).c_str()); //e.g.: "login:password"
curl_easy_setopt(curl, CURLOPT_POST, true);
//curl_easy_setopt(curl, CURLOPT_ENCODING, "UTF-8"); //does not change anything
res = curl_easy_perform(curl);
curl_easy_cleanup(curl);

result = C::toWString(readBuffer);
return res == 0; //0 = OK

当我要下载的文件被编码为 ANSI 时它工作正常（根据例如 Notepad++）。但是，当我尝试下载 UTF-8 文件 (UTF-8 without BOM) 时，由于编码问题，某些字符（例如波兰字母）出现错误。

例如，我运行两个文本相同的文件（"to jest teść to"）的代码，并保存到std::wstring。 result 来自 ANSI 文件，result2（有问题）来自 UTF-8 版本：

两个文件都在服务器上打开，例如Notepad++ 显示正确的文本。

那么，我如何使用 libcurl 获取 UTF-8 文件 内容并将其保存到 std::wstring正确的编码（因此 Visual Studio 的调试器会将其显示为 to jest teść to）？

Answer 1

libcurl 不会为您转换或翻译内容。它将向您的应用程序传送服务器发出的确切字节。

您可以使用 HTTP Accept headers 等来影响服务器响应的内容，但是如果您对收到的内容不满意，则需要检查接收到的字符集并进行相应的转换。

Answer 2

这不是 libcurl 问题。您将原始数据存储在 std::string 中，然后在下载完成后将其转换为 std::wstring。您必须查看 HTTP 响应中报告的字符集并将数据相应地解码为 std::wstring。 C::toWString() 没有字符集的概念，所以你应该使用其他东西，比如 ICONV 或 ICU。或者，如果您知道数据始终是 UTF-8，请手动进行转换（UTF 转换很容易手动编码），或者使用 C++11 的内置 UTF 转换，使用 std::wstring_convert class .

使用 libcurl 下载 UTF-8 文件（ANSI 工作正常）

Downloading UTF-8 file with libcurl (ANSI works fine)

c++

encoding

curl

utf-8

libcurl