检查UTF-8是wchar_t还是char？

Question

我正在调用 zlib API zipOpen 从我的 C++ 项目创建一个新的 zip 文件。函数签名是 extern zipFile ZEXPORT zipOpen (const char* pathname, int append)。

此调用最终会调用 fopen 以创建文件。但是，此函数不支持宽字符，我想通过发送 UTF-8 格式（由 char* 和 fit 函数签名表示）并在调用 fopen 之前检查字符串是否包含来修复它非 ascii 字符，如果没有，像以前一样调用 fopen。如果是，则转换为宽字符串 (wchar_t) 并调用 _wfopen。

所以问题是是否有 C/C++ API 检查 UTF-8 格式的字符串是否包含非 ascii 字符？

基本上我正在寻找类似于下面示例中的 isWide 的函数。我想知道是否使用字符串表示的文件名从 Windows api 调用 fopen 或 _wfopen 。

    std::string toUTF8(std::wstring str)
    {
        std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
        return converter.to_bytes(str));
    }
    ...
    ..
    .
    std::wstring s1 = L"おはよう";
    isWide(toUTF8(s1).c_str()); //this should return true.

    string s2 = "asdasd";
    isWide(s2); //this should return false. 

    std::wstring s3 = L"asdasd";
    isWide(toUTF8(s3)); //this should return false.

    for s in s1,s2,s3 do : //pseudo code, please forgive me :-) 
        if (isWide(toUTF8(s)))
            _wfopen(s,L"wb"); // create wide char file
        else
            fopen(s,"wb"); // create regular name file

和 isWide 的函数签名：

bool isWide(char * s);

正如下面的评论所述，之前已经提出过类似的问题，但没有用标准 API 解决。

谢谢

Answer 1

这取决于你对"wide"的定义。如果你只是想测试是否存在非ASCII字符，只需测试高位：

bool isWide(const char * s) {
  for (; *s; s++) {
    if (*s & 0x80)
      return true;
  }
  return false;
}

Answer 2

您可以遍历所有字符并检查最高有效位是否为“1”。参见：https://de.wikipedia.org/wiki/UTF-8，只有多字节字符设置了该位。

bool isWide(const std::string& string) {    
    for(auto& c : string) 
    { 
        if(c & 0x80) {
            return true;
        } 
    }
    return false;
}

Answer 3

没有理由检查字符串中是否有任何非 ASCII 字符。如果您知道它是 UTF-8（请注意 ASCII 是有效的 UTF-8，）只需转换它并始终无条件地调用 _wfopen()。

检查UTF-8是wchar_t还是char？

Check if UTF-8 is wchar_t or char?

c++

windows

unicode

zlib

utf-8