C++ 问题，将 wchar_t* 转换为字符串

Question

我这里有问题。这是在 Unicode 中。我有一个字符串表，其中包含值，由 ; 分隔。我整天都在做这件事，但总是立即遇到运行时错误。

字符串表看起来像：

`blah;blah;foo;bar;car;star`

然后代码：

// More than enough size for this
const int bufferSize = 2048;

// Resource ID to a StringTable
int resid = IDS_MAP;
wchar_t readMap[bufferSize];            
resid = LoadString(NULL, resid, readMap, bufferSize);  

wchar_t* line;
line = wcstok(readMap,L";");

while (line != NULL) {

    line = wcstok(NULL,L";");
    wstring wstr(line); // Problem
    string str(wstr.begin(), wstr.end()); // Problem

    MessageBox(0,line,0,0) // No problem
}

问题是当我尝试将 wchar_t* line 转换为 wstring、string 时。如果我取消注释这两行，它运行正常并且消息框正确显示。

有什么想法吗？在这里问这个问题是我最后的选择。谢谢。

Answer 1

这条语句：

line = wcstok(readMap,L";");

读取缓冲区中第一个分隔的 line。好的。

但是，在您的循环中，此语句：

line = wcstok(NULL,L";");

位于循环的顶部，因此丢弃第一次迭代的第一行，然后读取 下一个分隔line。最终，您的循环将到达缓冲区的末尾并且 wcstok() 将 return NULL，但是您在使用 line:

之前没有检查该条件

line = wcstok(readMap,L";"); // <-- reads the first line

while (line != NULL) {

    line = wcstok(NULL,L";"); // <-- 1st iteration throws away the first line
    wstring wstr(line); // <-- line will be NULL on last iteration

    //...
}

需要将 line = wcstok(NULL,L";"); 语句移到循环的底部：

wchar_t* line = wcstok(readMap, L";");

while (line != NULL)
{
    // use line as needed...

    line = wcstok(NULL, L";");
}

我建议将 while 循环更改为 for 循环以强制执行：

for (wchar_t* line = wcstok(readMap, L";"); (line != NULL); line = wcstok(NULL, L";"))
{
    // use line as needed...
}

另一方面，由于您使用的是 C++，您应该考虑使用 std:wistringstream 和 std:getline() 而不是 wcstok():

#include <string>
#include <sstream>

// after LoadString() exits, resid contains the
// number of character copied into readMap...
std::wistringstream iss(std::wstring(readMap, resid));

std::wstring line;
while (std::getline(iss, line, L';'))
{
    // use line as needed...
}

但不管怎样，这个说法是完全错误的：

string str(wstr.begin(), wstr.end()); // Problem

仅当 std::wstring 包含 #0 - #127 范围内的 ASCII 字符时，此语句才能正确。对于非 ASCII 字符，您必须执行数据转换以避免 Unicode 字符 > U+00FF 的数据丢失。

由于您在 Windows 上运行，您可以使用 Win32 API WideCharToMultiByte() 函数：

std::wstring line;
while (std::getline(iss, line, L';'))
{
    std::string str;

    // optionally substitute CP_UTF8 with any ANSI codepage you want...
    int len = WideCharToMultiByte(CP_UTF8, 0, line.c_str(), line.length(), NULL, 0, NULL, NULL);
    if (len > 0)
    {
        str.resize(len);
        WideCharToMultiByte(CP_UTF8, 0, line.c_str(), line.length(), &str[0], len, NULL, NULL);
    }

    // use str as needed...
    MessageBoxW(0, line.c_str(), L"line", 0);
    MessageBoxA(0, str.c_str(), "str", 0);
}

或者，如果您使用的是 C++11 或更高版本，则可以使用 std::wstring_convert class（但仅适用于 UTF-8/16/32 转换）：

#include <locale> 

std::wstring line;
while (std::getline(iss, line, L';'))
{
    std::wstring_convert<std::codecvt_utf8<wchar_t>, wchar_t> conv;
    std::string str = conv.to_bytes(line);

    // use str as needed...
    MessageBoxW(0, line.c_str(), L"line", 0);
    MessageBoxA(0, str.c_str(), "str", 0);
}

C++ 问题，将 wchar_t* 转换为字符串

C++ Issue, Converting wchar_t* to string

c++

unicode

tokenize