Prolog 中基本多语言平面 (BMP) 之外的转义字符

Question

作为参考，我在 Windows 10、64 位

上使用 Prolog v7.4.2

在 REPL 中输入以下代码：

write("\U0001D7F6"). % Mathematical Monospace Digit Zero

在输出中给我这个错误：

ERROR: Syntax error: Illegal character code
ERROR: write("
ERROR: ** here **
ERROR: \U0001D7F6") .

我知道 U+1D7F6 是一个有效的 Unicode 字符，这是怎么回事？

Answer 1

为了比较，我得到：

?- write('\U0001D7F6').

你的环境是什么，标志是怎么说的？

例如：

$ set | grep LANG
LANG=en_US.UTF-8

还有：

?- current_prolog_flag(encoding, F).
F = utf8.

Answer 2

SWI-Prolog 内部使用 C wchar_t 来表示 Unicode 字符。在 Windows 上，这些是 16 位的，旨在保存 UTF-16 编码的字符串。然而，SWI-Prolog 使用 wchar_t 来获得很好的代码点数组，因此实际上只支持 Windows 上的 UCS-2（代码点 u0000..uffff）。

在非Windows 系统上，wchar_t 通常是 32 位，因此支持完整的 Unicode 范围。

修复处理 wchar_t 并不是一件小事，因为 UTF-16 失去了很好的属性数组的每个元素都是一个代码点并使用我们自己的 32 位类型意味着我们不能使用 C 库宽字符函数，必须在 SWI-Prolog 中重新实现它们。这不仅可行，而且用纯 C 版本替换它们也会失去现代 C 运行时库中通常存在的优化。

Answer 3

字符代码的 ISO 核心标准语法看起来不同。例如，以下适用于 SICStus Prolog、Jekejeke Prolog、SWI-Prolog 等，因此更便携：

在 Mac 上使用 SWI-Prolog：

Welcome to SWI-Prolog (threaded, 64 bits, version 7.5.8)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.

?- set_prolog_flag(double_quotes, codes).
true.

?- X = "\x1D7F6\".
X = [120822].

?- write('\x1D7F6\'), nl.

Jekejeke Prolog 在 Mac:

Jekejeke Prolog 2, Runtime Library 1.2.2
(c) 1985-2017, XLOG Technologies GmbH, Switzerland

?- X = "\x1D7F6\".
X = [120822]

?- write('\x1D7F6\'), nl.

基本语法可在 ISO 核心标准第 6.4.2.1 节十六进制转义序列中找到。它的内容如下，比 U 语法更短：

hex_esc_seq --> "\x" hex_digit { hex_digit } "\".

Prolog 中基本多语言平面 (BMP) 之外的转义字符

Escaped Characters Outside the Basic Multilingual Plane (BMP) in Prolog

unicode

prolog

swi-prolog

unicode-escapes

iso-prolog