UTF-32 和 UCS-4 有什么区别?
What is the difference between UTF-32 and UCS-4?
UTF-32 和 UCS-4 有什么区别?
UTF-32 不应该是固定宽度的编码吗?
UTF-32
已作为 UCS-4
的子集开始。现在除了 UTF-32 标准具有额外的 Unicode 语义外,它是相同的。查看 wikipedia 的详细信息:
The original ISO 10646 standard defines a 31-bit encoding form called
UCS-4, in which each encoded character in the Universal Character Set
(UCS) is represented by a 32-bit friendly code value in the code space
of integers between 0 and hexadecimal 7FFFFFFF.
Because only 17 planes are actually in use, all current code points
are between 0 and 0x10FFFF. UTF-32 is a subset of UCS-4 that uses only
this range. Since the Principles and Procedures document of
JTC1/SC2/WG2 states that all future assignments of characters will be
constrained to the BMP or the first 14 supplementary planes, UTF-32
will be able to represent all Unicode characters. Accordingly, UCS-4
and UTF-32 are now identical except that the UTF-32 standard has
additional Unicode semantics.
不过,我不太确定additional Unicode semantics
是什么意思。也许有人可以提供更好的答案。
Unicode Standard Version 8.0, Appendix C 状态:
UCS-4 stands for “Universal Character Set coded in 4 octets.” It is
now treated simply as a synonym for UTF-32, and is considered the
canonical form for representation of characters in ISO 10646 (Universal Coded Character Set).
UTF-32 和 UCS-4 有什么区别? UTF-32 不应该是固定宽度的编码吗?
UTF-32
已作为 UCS-4
的子集开始。现在除了 UTF-32 标准具有额外的 Unicode 语义外,它是相同的。查看 wikipedia 的详细信息:
The original ISO 10646 standard defines a 31-bit encoding form called UCS-4, in which each encoded character in the Universal Character Set (UCS) is represented by a 32-bit friendly code value in the code space of integers between 0 and hexadecimal 7FFFFFFF.
Because only 17 planes are actually in use, all current code points are between 0 and 0x10FFFF. UTF-32 is a subset of UCS-4 that uses only this range. Since the Principles and Procedures document of JTC1/SC2/WG2 states that all future assignments of characters will be constrained to the BMP or the first 14 supplementary planes, UTF-32 will be able to represent all Unicode characters. Accordingly, UCS-4 and UTF-32 are now identical except that the UTF-32 standard has additional Unicode semantics.
不过,我不太确定additional Unicode semantics
是什么意思。也许有人可以提供更好的答案。
Unicode Standard Version 8.0, Appendix C 状态:
UCS-4 stands for “Universal Character Set coded in 4 octets.” It is now treated simply as a synonym for UTF-32, and is considered the canonical form for representation of characters in ISO 10646 (Universal Coded Character Set).