为 C++17 提议的 UTF-8 字符文字有什么意义?

What is the point of the UTF-8 character literals proposed for C++17?

N4267提出的这些到底是什么意思?

它们的唯一功能似乎是防止指定扩展 ASCII 字符或部分 UTF-8 代码点。它们仍然存储在一个固定宽度的 8 位字符中(据我所知,对于几乎所有用例,这是处理 UTF-8 的正确和最佳方式),因此它们不支持非 ASCII 字符全部。怎么回事?

(实际上,我也不完全确定我是否理解 UTF-8 字符串文字的必要性。我想这是编译器对 Unicode 字符串执行 weird/ambiguous 操作并结合 Unicode 验证的担忧?)

Evolution Working Group issue 119: N4197 Adding u8 character literals, [tiny] Why no u8 character literals? 包含了基本原理,它跟踪了提案并说:

We have five encoding-prefixes for string-literals (none, L, u8, u, U) but only four for character literals -- the missing one is u8 for character literals.

This matters for implementations where the narrow execution character set is not ASCII. In such a case, u8 character literals would provide an ideal way to write character literals with guaranteed ASCII encoding (the single-code-unit u8 encodings are exactly ASCII), but... we don't provide them. Instead, the best one can do is something like this:

char x_ascii = { u'x' };

... where we'll get a narrowing error if the codepoint doesn't fit in a 'char'. (Note that this is not quite the same as u8'x', which would give us an error if the codepoint was not representable as a single code unit in UTF-8.)