未排序的 Unicode (UTF-16) 数据如何存储在 varchar 列中？

Question

这纯粹是理论问题，需要我思考一下

假设我有 Unicode 旋风分离器 ( 1F300) 符号。如果我尝试将它存储在具有默认 Latin1_General_CI_AS 排序规则的 varchar 列中，旋风符号不能不适合 varchar 中每个符号使用的一个字节...

我可以看到的方式：

就像 javascript 对基本平面之外的符号所做的那样（BMP) where it stores them as 2 symbols (surrogate pairs), and then additional processing is needed to ...
只需截断符号，存储第一个字节并删除第二个字节....（数据是吐司 - 你应该阅读手册....）
数据被销毁，没有任何用处被保存...（数据是敬酒 - 你应该阅读手册....）
超出我心智能力的其他一些选择.....

插入几个不同的 unicode 符号后，我做了一些研究

 INSERT INTO [Table] (Field1)
 VALUES ('')

 INSERT INTO [Table] (Field1)
 VALUES ('')

然后在这两种情况下将它们作为字节 SELECT cast (field1 as varbinary(10)) 读取，我得到 0x3F3F.

ascii 中的

3F 是 ? (question mark) 例如两个问号 (??) 我在正常 [=22] 时也会看到=] 这是否意味着数据是敬酒的，甚至没有存储第一口？

未排序的Unicode数据如何存储在varchar列中？

Answer 1

是的，数据已经消失了。

Varchar requires less space, compared to NVarchar。但这种减少是有代价的。 Varchar 没有 space 来存储 Unicode 字符（每个字符 1 个字节，内部查找不够大）。

来自Microsoft's Developer Network：

...consider using the Unicode nchar or nvarchar data types to minimize character conversion issues.

如您所见，不受支持的字符会用问号代替。

Answer 2

数据完好无损，与您看到的完全一样，2 x 0x3F 字节。这发生在插入之前的类型转换期间，并且实际上与 cast('' as varbinary(2)) 相同，它也是 0xF3F3（与转换 N'' 相对）。

When Unicode data must be inserted into non-Unicode columns, the columns are internally converted from Unicode by using the WideCharToMultiByte API and the code page associated with the collation. If a character cannot be represented on the given code page, the character is replaced by a question mark (?) Ref.

未排序的 Unicode (UTF-16) 数据如何存储在 varchar 列中？

How is Unicode (UTF-16) data that is out of collation stored in varchar column?

sql-server

unicode

encoding

varchar

collation