读取 UTF-8 输入

Reading UTF-8 input

我正在制作一个程序,类似于抽认卡,但基于控制台。在程序开始时,我从一个包含 UTF-8 编码日文字符(例如 "ひらがな, カタカナ, 患者")的文件中读取。但是,当我调用 std::getline() 时,输入结果为 ""。我怎样才能做到这一点?也许将 STD_INPUT_HANDLE 作为文件打开?我使用 SetConsoleOutputCP()SetConsoleCP() 以及 CP_UTF8 作为启用 UTF-8 打印的参数。

根据 @πάντα ῥεῖ

的要求,可复制的最小示例
#include <iostream>
#include <Windows.h>
#include <fstream>
#include <vector>
#include <string>

void populate(std::vector<std::string>& in) {
    std::ifstream file("words.txt"); // fill this with some UTF-8 characters, then check the contents of [in]

    std::string line;
    while (std::getline(file, line)) {
        in.emplace_back(line);
    }
}

int main() {
    SetConsoleOutputCP(CP_UTF8);
    SetConsoleCP(CP_UTF8);

    SetConsoleTitleA("Example");

    std::vector<std::string> arr;
    populate(arr);

    std::string input_utf8; // type some UTF-8 characters when asked for input
    std::cin >> input_utf8;

    for (std::string s : arr)
        if (input_utf8 == s)
            std::cout << "It works! The input wasn't null!";
}

这个程序适合我。我需要代码页 932 (Shift-JIS) 才能正确显示内容。 (我的 Windows 10 机器上没有启用日语,所以它不依赖于此。)如果我只是 std::cinstd::wcin,我可以在调试器中看到我是没有 得到正确的输入。但是如果我使用 ReadConsoleW/WriteConsoleW 一切看起来都是正确的。

#define _CRT_SECURE_NO_WARNINGS
#include <windows.h>
#include <iostream>

using namespace std;

int main()
{
                                        //This code-page-changing stuff, plus the restoring later, is from
                                        //https://www.codeproject.com/articles/34068/unicode-output-to-the-windows-console
    UINT oldcp = GetConsoleOutputCP();  //what is the current code page? store for later
    SetConsoleOutputCP(932);            //set it up so it can do Japanese

    cout << "Enter something: "; 

    wchar_t wmsg[32];
    DWORD used;
    if (!ReadConsole(GetStdHandle(STD_INPUT_HANDLE),
        wmsg,
        31, //because wmsg has 32 slots. ?
        &used,
        nullptr))
        cerr << "ReadConsole failed, le = " << GetLastError() << endl;

    size_t len = used;
    cout << "You entered: ";
    //From https://cboard.cprogramming.com/windows-programming/112382-printing-unicode-console.html
    if (!WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), 
            wmsg, (DWORD) len,
            &used, 0))
            cerr << "WriteConsole failed, le = " << GetLastError() << endl;
    cout << '\n';

    cout << "Hit enter to end (and restore previous code page)."; cin.get();
    SetConsoleOutputCP(oldcp); SetConsoleCP(oldcp);
    return 0;
}