std::wstring 中不同区域设置的 ASCII 符号的字节表示

Question

Windows C++ 应用程序。我们有一个仅包含 ASCII 符号的字符串：std::wstring(L"abcdeABCDE ... any other ASCII symbol")。请注意，这是使用 wchar_t.

的 std::wstring

问题 - 此字符串的字节表示取决于本地化设置还是其他？我可以假设如果我收到这样的字符串（例如，来自 WindowsAPI），而应用程序是运行，它的字节将与我的 PC 上的字节相同吗？

Answer 1

一般来说，对于字符（不是转义序列），wchar_t 和 wstring 必须使用与 ASCII 相同的代码（只是扩展到 2 个字节）。但是我不确定小于 32 的代码，当然大于 128 的代码在输出时可以有不同的含义（如在 ASCII 中），所以为了避免输出问题明确设置特定的语言环境，例如：

  locale("en_US.UTF-8")

标准输出

  wcout.imbue(locale("en_US.UTF-8"));

更新：

我发现了关于添加

的另一个建议

  std::ios_base::sync_with_stdio(false);

在使用 imbue

设置本地化之前

查看 How can I use std::imbue to set the locale for std::wcout?

的详细信息

Answer 2

文字串的字节表示不依赖于环境。它被硬编码为来自编辑器的二进制数据。但是，二进制数据的解释方式取决于当前的代码页，因此在运行时转换为宽字符串时可能会得到不同的结果（与使用前导 L 定义字符串相反，这意味着宽字符将在编译时设置。)

为了安全起见，使用setlocale()来保证转换使用的编码。那就不用担心环境了。

这可能有帮助："By definition, the ASCII character set is a subset of all multibyte-character sets. In many multibyte character sets, each character in the range 0x00 – 0x7F is identical to the character that has the same value in the ASCII character set. For example, in both ASCII and MBCS character strings, the 1-byte NULL character ('[=20=]') has value 0x00 and indicates the terminating null character."

来自： Visual Studio Character Sets 'Not set' vs 'Multi byte character set'

std::wstring 中不同区域设置的 ASCII 符号的字节表示

byte representation of ASCII symbols in std::wstring with different locales

c++

windows

locale

wstring