读取 UTF-8 输入
Reading UTF-8 input
我正在制作一个程序,类似于抽认卡,但基于控制台。在程序开始时,我从一个包含 UTF-8 编码日文字符(例如 "ひらがな, カタカナ, 患者"
)的文件中读取。但是,当我调用 std::getline()
时,输入结果为 ""
。我怎样才能做到这一点?也许将 STD_INPUT_HANDLE
作为文件打开?我使用 SetConsoleOutputCP()
和 SetConsoleCP()
以及 CP_UTF8
作为启用 UTF-8 打印的参数。
根据 @πάντα ῥεῖ
的要求,可复制的最小示例
#include <iostream>
#include <Windows.h>
#include <fstream>
#include <vector>
#include <string>
void populate(std::vector<std::string>& in) {
std::ifstream file("words.txt"); // fill this with some UTF-8 characters, then check the contents of [in]
std::string line;
while (std::getline(file, line)) {
in.emplace_back(line);
}
}
int main() {
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
SetConsoleTitleA("Example");
std::vector<std::string> arr;
populate(arr);
std::string input_utf8; // type some UTF-8 characters when asked for input
std::cin >> input_utf8;
for (std::string s : arr)
if (input_utf8 == s)
std::cout << "It works! The input wasn't null!";
}
这个程序适合我。我需要代码页 932 (Shift-JIS) 才能正确显示内容。 (我的 Windows 10 机器上没有启用日语,所以它不依赖于此。)如果我只是 std::cin
或 std::wcin
,我可以在调试器中看到我是没有 得到正确的输入。但是如果我使用 ReadConsoleW
/WriteConsoleW
一切看起来都是正确的。
#define _CRT_SECURE_NO_WARNINGS
#include <windows.h>
#include <iostream>
using namespace std;
int main()
{
//This code-page-changing stuff, plus the restoring later, is from
//https://www.codeproject.com/articles/34068/unicode-output-to-the-windows-console
UINT oldcp = GetConsoleOutputCP(); //what is the current code page? store for later
SetConsoleOutputCP(932); //set it up so it can do Japanese
cout << "Enter something: ";
wchar_t wmsg[32];
DWORD used;
if (!ReadConsole(GetStdHandle(STD_INPUT_HANDLE),
wmsg,
31, //because wmsg has 32 slots. ?
&used,
nullptr))
cerr << "ReadConsole failed, le = " << GetLastError() << endl;
size_t len = used;
cout << "You entered: ";
//From https://cboard.cprogramming.com/windows-programming/112382-printing-unicode-console.html
if (!WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE),
wmsg, (DWORD) len,
&used, 0))
cerr << "WriteConsole failed, le = " << GetLastError() << endl;
cout << '\n';
cout << "Hit enter to end (and restore previous code page)."; cin.get();
SetConsoleOutputCP(oldcp); SetConsoleCP(oldcp);
return 0;
}
我正在制作一个程序,类似于抽认卡,但基于控制台。在程序开始时,我从一个包含 UTF-8 编码日文字符(例如 "ひらがな, カタカナ, 患者"
)的文件中读取。但是,当我调用 std::getline()
时,输入结果为 ""
。我怎样才能做到这一点?也许将 STD_INPUT_HANDLE
作为文件打开?我使用 SetConsoleOutputCP()
和 SetConsoleCP()
以及 CP_UTF8
作为启用 UTF-8 打印的参数。
根据 @πάντα ῥεῖ
的要求,可复制的最小示例#include <iostream>
#include <Windows.h>
#include <fstream>
#include <vector>
#include <string>
void populate(std::vector<std::string>& in) {
std::ifstream file("words.txt"); // fill this with some UTF-8 characters, then check the contents of [in]
std::string line;
while (std::getline(file, line)) {
in.emplace_back(line);
}
}
int main() {
SetConsoleOutputCP(CP_UTF8);
SetConsoleCP(CP_UTF8);
SetConsoleTitleA("Example");
std::vector<std::string> arr;
populate(arr);
std::string input_utf8; // type some UTF-8 characters when asked for input
std::cin >> input_utf8;
for (std::string s : arr)
if (input_utf8 == s)
std::cout << "It works! The input wasn't null!";
}
这个程序适合我。我需要代码页 932 (Shift-JIS) 才能正确显示内容。 (我的 Windows 10 机器上没有启用日语,所以它不依赖于此。)如果我只是 std::cin
或 std::wcin
,我可以在调试器中看到我是没有 得到正确的输入。但是如果我使用 ReadConsoleW
/WriteConsoleW
一切看起来都是正确的。
#define _CRT_SECURE_NO_WARNINGS
#include <windows.h>
#include <iostream>
using namespace std;
int main()
{
//This code-page-changing stuff, plus the restoring later, is from
//https://www.codeproject.com/articles/34068/unicode-output-to-the-windows-console
UINT oldcp = GetConsoleOutputCP(); //what is the current code page? store for later
SetConsoleOutputCP(932); //set it up so it can do Japanese
cout << "Enter something: ";
wchar_t wmsg[32];
DWORD used;
if (!ReadConsole(GetStdHandle(STD_INPUT_HANDLE),
wmsg,
31, //because wmsg has 32 slots. ?
&used,
nullptr))
cerr << "ReadConsole failed, le = " << GetLastError() << endl;
size_t len = used;
cout << "You entered: ";
//From https://cboard.cprogramming.com/windows-programming/112382-printing-unicode-console.html
if (!WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE),
wmsg, (DWORD) len,
&used, 0))
cerr << "WriteConsole failed, le = " << GetLastError() << endl;
cout << '\n';
cout << "Hit enter to end (and restore previous code page)."; cin.get();
SetConsoleOutputCP(oldcp); SetConsoleCP(oldcp);
return 0;
}