为 C++17 提议的 UTF-8 字符文字有什么意义？

Question

N4267提出的这些到底是什么意思？

它们的唯一功能似乎是防止指定扩展 ASCII 字符或部分 UTF-8 代码点。它们仍然存储在一个固定宽度的 8 位字符中（据我所知，对于几乎所有用例，这是处理 UTF-8 的正确和最佳方式），因此它们不支持非 ASCII 字符全部。怎么回事？

（实际上，我也不完全确定我是否理解 UTF-8 字符串文字的必要性。我想这是编译器对 Unicode 字符串执行 weird/ambiguous 操作并结合 Unicode 验证的担忧？）

Answer 1

Evolution Working Group issue 119: N4197 Adding u8 character literals, [tiny] Why no u8 character literals? 包含了基本原理，它跟踪了提案并说：

We have five encoding-prefixes for string-literals (none, L, u8, u, U) but only four for character literals -- the missing one is u8 for character literals.

This matters for implementations where the narrow execution character set is not ASCII. In such a case, u8 character literals would provide an ideal way to write character literals with guaranteed ASCII encoding (the single-code-unit u8 encodings are exactly ASCII), but... we don't provide them. Instead, the best one can do is something like this:
char x_ascii = { u'x' };
... where we'll get a narrowing error if the codepoint doesn't fit in a 'char'. (Note that this is not quite the same as u8'x', which would give us an error if the codepoint was not representable as a single code unit in UTF-8.)

为 C++17 提议的 UTF-8 字符文字有什么意义？

What is the point of the UTF-8 character literals proposed for C++17?

c++

unicode

utf-8

c++17