为 icu::UnicodeString 选择编码

Choosing encoding for icu::UnicodeString

我发现自己需要一种方法来将字符串更改为可安全用于 ASCII 和 UTF16-LE 的小写字母（如某些 windows 注册表字符串中所见）并遇到了这个问题: How to convert std::string to lower case?

对我来说似乎是 "most correct" 的答案（我没有使用 Boost）是 demonstrated using the icu library。

在此回答中，他为 UnicodeString 构造函数指定了编码 "ISO-8859-1"。为什么这是正确的值，我怎么知道要使用什么？

ISO-8859-1 在我运行针对仅使用拉丁字符的 ASCII 编码字符串的少数单元测试中发挥了作用，但如果我不知道为什么，我不喜欢使用它.

如果重要的话，我主要关心的是处理通常以 ASCII 格式存储的英文数据，但是 windows 注册表能够以 UTF-16LE 格式存储内容，我不想这样做通过用非 unicode 安全的东西乱丢我的代码来阻止我自己支持其他语言。

I found myself in need of a way to change a string to lower case for the purpose of case-insensitive string comparison

UnicodeString 在 ICU 中有许多 caseCompare() 方法来执行比较 "case-insensitively using full case folding"。您不需要手动转换字符串。

In this answer, he specified the encoding "ISO-8859-1" for the UnicodeString constructor. Why is this the correct value and how do I know what to use?

因为作者将 ISO-8859-1 编码的 char* 字符串文字传递给构造函数。 UnicodeString代表一个UTF-16编码的字符串。如果使用 char* 作为输入构造它，则必须指定输入数据编码的正确字符集，以便 UnicodeString 可以将其解码为 Unicode，然后 re-encode 将其解码为 UTF-16 .

为 icu::UnicodeString 选择编码

Choosing encoding for icu::UnicodeString

c++

unicode

icu