如何在 iOS 上转换为 "combining diacritical marks"

How to convert to "combining diacritical marks" on iOS

在我的应用程序中,我想将其后跟 "modifier diacritical marks"(例如 "oˆ",其中“^”是 unicode 0x02c6)的字符转换成完全预组合的字符(例如“ô”——统一码 0x00f4)。我尝试使用 NSString 方法 precomposedStringWithCanonicalMapping,但在我用头撞墙试图弄清楚为什么它不起作用几个小时后,我发现它只能将 "combining diacritical marks" (http://www.unicode.org/charts/PDF/U0300.pdf) 转换为预合成人物。好的,所以我需要做的就是将所有 "modifier diacritical marks" 转换为 "combining diacritical marks",然后对生成的字符串执行 precomposedStringWithCanonicalMapping,我就完成了。这确实有效,但我想知道是否有更容易 tedious/error 的方法来做到这一点?这是我的 NSString 类别方法,它似乎修复了大部分字符-

- (instancetype)combineDiacritics
{
    static NSDictionary<NSNumber *, NSNumber *> *sDiacriticalSubstDict; //unichar of diacritic -> unichar of combining diacritic
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        //http://www.unicode.org/charts/PDF/U0300.pdf
        sDiacriticalSubstDict = @{ @(0x02cb) : @(0x0300), @(0x00b4) : @(0x0301), @(0x02c6) : @(0x0302), @(0x02dc) : @(0x0303), @(0x02c9) : @(0x0304),   //Grave, Acute, Circumflex, Tilde, Macron
                                   @(0x00af) : @(0x0305), @(0x02d8) : @(0x0306), @(0x02d9) : @(0x0307), @(0x00a8) : @(0x0308), @(0x02c0) : @(0x0309),   //Overline, Breve, Dot above, Diaeresis
                                   @(0x00b0) : @(0x030a), @(0x02da) : @(0x030b), @(0x02c7) : @(0x030c), @(0x02c8) : @(0x030d), @(0x02bb) : @(0x0312),   //Ring above, Double Acute, Caron, Vertical line above, Cedilla above
                                   @(0x02bc) : @(0x0313), @(0x02bd) : @(0x0314), @(0x02b2) : @(0x0321), @(0x02d4) : @(0x0323), @(0x02b1) : @(0x0324),   //Comma above, Reversed comma above, Palatalized hook below, Dot below, Diaeresis below
                                   @(0x00b8) : @(0x0327), @(0x02db) : @(0x0328), @(0x02cc) : @(0x0329), @(0x02b7) : @(0x032b), @(0x02cd) : @(0x0331),   //Cedilla, Ogonek, Vert line below, Inverted double arch below, Macron below
                                   };
    });
    NSMutableString* __block buffer = [NSMutableString stringWithCapacity:self.length];
    [self enumerateSubstringsInRange:NSMakeRange(0, self.length) options:NSStringEnumerationByComposedCharacterSequences usingBlock: ^(NSString* substring, NSRange substringRange, NSRange enclosingRange, BOOL* stop) {
                          NSString *newString = nil;
                          if (substring.length == 1)    //The diacriticals are all Unicode BMP.
                          {
                              unichar uniChar = [substring characterAtIndex:0];
                              unichar newUniChar = [sDiacriticalSubstDict[@(uniChar)] integerValue];
                              if (newUniChar != 0)
                              {
                                  NSLog(@"Unichar %04x => %04x", uniChar, newUniChar);
                                  newString = [NSString stringWithCharacters:&newUniChar length:1];
                              }
                          }
                          if (newString)
                              [buffer appendString:newString];
                          else
                              [buffer appendString:substring];
                      }];

    NSString *precomposedStr = [buffer precomposedStringWithCanonicalMapping];
    return precomposedStr;
}

有人知道进行此转换的更多内置方法吗?

没有执行此转换的内置方法,因为间距修饰符字母块 (U+02B0..U+02FF) 中的字符不打算用作变音标记。来自 Unicode 标准的第 7.8 节:

They are not formally combining marks (gc=Mn or gc=Mc) and do not graphically combine with the base letter that they modify. They are base characters in their own right.

Spacing Clones of Diacritics. Some corporate standards explicitly specify spacing and nonspacing forms of combining diacritical marks, and the Unicode Standard provides matching codes for these interpretations when practical.

如果您想将它们转换为组合形式,您将需要从 Spacing Modifier Letters code chart.[=14= 中的交叉引用构建一个 table(正如您已经在做的那样) ]