如何在 Dart 中反转包含代理项对的字符串?

How to reverse strings that contain surrogate pairs in Dart?

我正在使用 Dart 研究算法,当我实际遵循 TDD 时,我意识到我的代码有一些局限性。

作为面试问题的一部分,我试图反转字符串,但无法正确反转代理项对。

const simple = 'abc';
const emoji = '';
const surrogate = '‍♂️‍';

String rev(String s) {
    return String.fromCharCodes(s.runes.toList().reversed);
}

void main() {
    print(simple);
    print(rev(simple));
    print(emoji);
    print(rev(emoji));
    print(surrogate);
    print(rev(surrogate));
}

输出:

abc
cba


‍♂️‍
‍️♂‍

你可以看到简单的表情符号被正确反转,因为我使用 runes 而不是简单地执行 s.split('').toList().reversed.join(''); 但代理对被错误地反转。

如何使用 Dart 编程语言反转可能包含代理对的字符串?

反转字符串时,必须对字素进行操作,而不是字符或代码单元。使用 grapheme_splitter.

Dart 2.7 引入了一个 supports grapheme cluster-aware operations. The package is called characters 的新包。 characters 是表示为 Unicode 扩展字素簇的字符的包。

Dart’s standard String class uses the UTF-16 encoding. This is a common choice in programming languages, especially those that offer support for running both natively on devices, and on the web.

UTF-16 strings usually work well, and the encoding is transparent to the developer. However, when manipulating strings, and especially when manipulating strings entered by users, you may experience a difference between what the user perceives as a character, and what is encoded as a code unit in UTF-16.

Source: "Announcing Dart 2.7: A safer, more expressive Dart" by Michael Thomsen, section "Safe substring handling"

该软件包还将帮助您按照本地程序员所期望的方式使用表情符号反转您的字符串。

使用简单的 Strings,您发现问题:

String hi = 'Hi ';
print('String.length: ${hi.length}');
// Prints 7; would expect 4

characters

String hi = 'Hi ';
print(hi.characters.length);
// Prints 4
print(hi.characters.last);
// Prints 

source code of the characters package, it's far from simple but looks easier to digest and better documented than grapheme_splitter值得一看。 characters 包也由 Dart 团队维护。