源文件编码 UTF-8 no-bom 和 Visual Studio MFC

Question

我确实在 C++ 文件中有那个简单的代码片段（Visual Studio 2019，MFC 项目）

CString teststr = _T("täst"); //second letter is a german "Umlaut"
TRACE(_T("\n%s: %d"), static_cast<LPCTSTR>(teststr), teststr.GetLength());

VS 中源文件的默认编码是“西欧 (Windows) - 代码页 1252” - 至少在我的系统上是这样。

TRACE 给出了正确的文本和正确的长度 (4)。

但是，我想将 sourcefiles-encoding 更改为 UTF-8，以便将来能够独立于开发人员语言。

如果我将编码更改为“Unicode（带有签名的 UTF-8）- 代码页 65001)”，它仍然没问题，除了源文件有一个 BOM - 这是我不喜欢的。

当我将源保存为“Unicode（UTF-8 无签名）- 代码页 65001)”（这是我想使用的编码）时，真正的问题出现了。当我这样做时，源文件在编辑器中看起来仍然很好，但 TRACE 告诉我："tÃ¤st: 5" 哪个原因是严重错误的以及生产代码中严重错误和崩溃的来源。

所以问题是：如何将源代码保存为不带 BOM 的 UTF-8 格式并且仍然有效？是否有任何设置或扩展可能对这里有帮助？

Answer 1

假设您进行 Unicode 构建，编译器实际看到的代码

CStringW s = L"täst";

这取决于编译器如何将源代码字符串转换为宽字符串。 MSVC 中的默认设置是没有 BOM 的源文件位于当前代码页中，在您的情况下为 CP1252。

您可以使用 /source-charset:utf-8 告诉编译器实际编码。您在项目设置的 其他选项 字段中提供该选项。

See the compiler command line option.

Answer 2

查看 /utf-8 编译器选项，特别是（强调我的）：

You can use the /utf-8 option to specify both the source and execution character sets as encoded by using UTF-8. It's equivalent to specifying /source-charset:utf-8 /execution-charset:utf-8 on the command line. Any of these options also enables the /validate-charset option by default. For a list of supported code page identifiers and character set names, see Code Page Identifiers.

By default, Visual Studio detects a byte-order mark to determine if the source file is in an encoded Unicode format, for example, UTF-16 or UTF-8. If no byte-order mark is found, it assumes the source file is encoded using the current user code page, unless you've specified a code page by using /utf-8 or the /source-charset option. Visual Studio allows you to save your C++ source code by using any of several character encodings. For information about source and execution character sets, see Character Sets in the language documentation.

Answer 3

感谢您的回答，它们工作正常（/utf-8 编译器开关）

现在我知道问题是在不同的源文件中有不同的编码，我不想将它们全部批量编码为 utf-8。所以我无法全局告诉编译器如何处理 no-bom 文件

所以我的解决方案是：不要将 utf-8 nobom 与 c++ 源一起使用，始终使用 bom 或将现有文件保留为 CP1252。

源文件编码 UTF-8 no-bom 和 Visual Studio MFC

sourcefile encoding UTF-8 no-bom and Visual Studio MFC

mfc

utf-8

visual-studio-2019