为什么要对撇号进行如此详尽的 RTF 编码?
Why this elaborate RTF encoding of an apostrophe?
Scrivener 使用这种精巧的撇号编码生成 RTF 文件:
They didn\loch\af0\hich\af0\dbch\af0\uc1\u8217\'92t do it.
Unicode 8217 是“右单引号”。好的,但是这个 RTF 有那个 unicode 字符和 \'92
。这是怎么回事?
RTF 分解为以下内容:
They didn - plain text
\loch - The text consists of single-byte low-ANSI (0x00–0x79) characters
\af0 - Associated Font Number 0
\hich - The text consists of single-byte high-ANSI (0x80–0xFF) characters
\af0 - Associated Font Number 0
\dbch - The text consists of double-byte characters
\af0 - Associated Font Number 0
\uc1 - number of bytes corresponding to a given \uN Unicode character
\u8217 - a single Unicode character that has no equivalent ANSI representation based on the current ANSI code page
\'92 - A hexadecimal value, based on the specified character set (may be used to identify 8-bit values).
t do it. - plain text
其中一些在这种情况下是多余的,可以忽略,它只是字体信息。重要的是 \u8217
表示 Unicode 中的撇号,\'92
表示 ANSI 中的等效撇号,而 \uc1
表示 \'92
占用 1 个字符。启用 Unicode 的 RTF reader 将处理 \u8217
并忽略 \'92
。非 Unicode RTF reader 将忽略 \u8217
并处理 \'92
。 Unicode RTF:
的 RTF 规范中对此进行了说明
\uN
This keyword represents a single Unicode character that has no equivalent ANSI representation based on the current ANSI code page. N represents the Unicode character value expressed as a decimal number.
This keyword is followed immediately by equivalent character(s) in ANSI representation. In this way, old readers will ignore the \uN keyword and pick up the ANSI representation properly. When this keyword is encountered, the reader should ignore the next N characters, where N corresponds to the last \ucN value encountered.
...
An RTF writer, when it encounters a Unicode character with no corresponding ANSI character, should output \uN followed by the best ANSI representation it can manage. Also, if the Unicode character translates into an ANSI character stream with count of bytes differing from the current Unicode Character Byte Count, it should emit the \ucN keyword prior to the \uN keyword to notify the reader of the change.
Scrivener 使用这种精巧的撇号编码生成 RTF 文件:
They didn\loch\af0\hich\af0\dbch\af0\uc1\u8217\'92t do it.
Unicode 8217 是“右单引号”。好的,但是这个 RTF 有那个 unicode 字符和 \'92
。这是怎么回事?
RTF 分解为以下内容:
They didn - plain text
\loch - The text consists of single-byte low-ANSI (0x00–0x79) characters
\af0 - Associated Font Number 0
\hich - The text consists of single-byte high-ANSI (0x80–0xFF) characters
\af0 - Associated Font Number 0
\dbch - The text consists of double-byte characters
\af0 - Associated Font Number 0
\uc1 - number of bytes corresponding to a given \uN Unicode character
\u8217 - a single Unicode character that has no equivalent ANSI representation based on the current ANSI code page
\'92 - A hexadecimal value, based on the specified character set (may be used to identify 8-bit values).
t do it. - plain text
其中一些在这种情况下是多余的,可以忽略,它只是字体信息。重要的是 \u8217
表示 Unicode 中的撇号,\'92
表示 ANSI 中的等效撇号,而 \uc1
表示 \'92
占用 1 个字符。启用 Unicode 的 RTF reader 将处理 \u8217
并忽略 \'92
。非 Unicode RTF reader 将忽略 \u8217
并处理 \'92
。 Unicode RTF:
\uN
This keyword represents a single Unicode character that has no equivalent ANSI representation based on the current ANSI code page. N represents the Unicode character value expressed as a decimal number.
This keyword is followed immediately by equivalent character(s) in ANSI representation. In this way, old readers will ignore the \uN keyword and pick up the ANSI representation properly. When this keyword is encountered, the reader should ignore the next N characters, where N corresponds to the last \ucN value encountered.
...
An RTF writer, when it encounters a Unicode character with no corresponding ANSI character, should output \uN followed by the best ANSI representation it can manage. Also, if the Unicode character translates into an ANSI character stream with count of bytes differing from the current Unicode Character Byte Count, it should emit the \ucN keyword prior to the \uN keyword to notify the reader of the change.