如何比较swift中包含unicode字符的字符串是否相等?

How to compare string equality which contains unicode characters in swift?

在我的应用程序中,我试图比较来自具有 JSON 文件数量的回购的值,每个 JSON 文件将具有某些国家/地区的值作为字典,例如:

{cz: "Doplňky k Apple TV"
 dk: "Apple TVtilbehør" }  //string1 == "Doplňky k Apple TV"

类似的,我有一个本地 plist,它也有相同国家的字典,例如:

{cz: "Doplňky*k*Apple*TV"
 dk: "*Apple*TV*Tilbehør*" } //string2 == "Doplňky*k*Apple*TV"

所以,基本上我需要比较每个国家/地区的值,然后只向用户显示差异。

所以,在这种情况下,JSON file(string1) 和本地 plist(string2) 中的 cz 值除了 string2 中有星号之外是相同的。当我只是删除星号并比较字符串时,它们仍然不匹配,因为 Doplňky k Apple TVstring1 中的 Apple 之后有不可见的 unicode space 看起来像白色 space.

下面是我实现逻辑的代码:

if string2.replaceString(["*", "\u{00a0}"], " ").trimmingCharacters(in: .whitespaces) == string1.replacingOccurrences(of: "\u{00a0}", with: " "){
  //Do something
}

Doplňky k Apple TV 字符串看起来像是来自 Apple 网站。当我在他们的网站上查看时,这个字符串包含 NO-BREAK SPACE (U+00A0) between Apple & TV. It's a white space character, but it doesn't equal to a normal SPACE (U+0020).

"Doplňky k Apple\u{00a0}TV" == "Doplňky k Apple TV" // false

首先要说明的是 - 这重要吗?我们应该平等对待还是不平等对待?

然后你有 Apple TVtilbehør & *Apple*TV*Tilbehør* 个字符串。是故意的错字吗?或者 Apple TVtilbehør 应该是 Apple TV Tilbehør?让我们假设它是故意的错字来测试你的比较。

接下来,*Apple*TV*Tilbehør*字符串中的这些*(在beginning/end处)是...? 第二件事要说明 - 我们应该忽略它们吗?它们代表空格吗?

接下来是 Unicode equivalence. How would you like to compare these two strings? Swift helps you here (source):

Comparing strings for equality using the equal-to operator (==) or a relational operator (like < or >=) is always performed using Unicode canonical representation. As a result, different representations of a string compare as being equal.

"Cafe\u{301}" == "Café" // true

其他国家呢?就像 Straße 等于 Strasse 的德国? 第三点要说明 - 我们应该如何处理这些字符串?

如您所见,有很多事情需要考虑。你有规格吗?跟着它。没有规范?你的算法迟早会停止工作。

游乐场

我冒昧地自己指定了所有这些东西:

  • 所有空格都相等
  • * 在 beginning/end 没关系(忽略)
  • Straße 不等于 Strasse

示例代码:

import Foundation

let json = [
    // U+00A0 is NO-BREAK SPACE which looks like a normal space (U+0020)
    "cz": "Doplňky k Apple\u{00a0}TV",
    "dk": "Apple TV Tilbehør",
    "en": "Hello",
    "de": "Straße",
    "fr": "Expos\u{00E9}" // Exposé
]

let plist = [
    "cz": "Doplňky*k*Apple*TV",
    "dk": "*Apple*TV*Tilbehør*",
    "es": "Hola",
    "de": "Strasse",
    "fr": "Expose\u{0301}" // Exposé
]

let jsonKeys = Set(json.keys)
let plistKeys = Set(plist.keys)
let commonKeys = jsonKeys.intersection(plistKeys)
let keysMissingInJson = plistKeys.subtracting(jsonKeys)
let keysMissingInPlist = jsonKeys.subtracting(plistKeys)

print("Languages missing in JSON: \(keysMissingInJson.count)")
keysMissingInJson.forEach { key in
    print(" - \(key)")
}

print("Languages missing in PLIST: \(keysMissingInPlist.count)")
keysMissingInPlist.forEach { key in
    print(" - \(key)")
}

let differentValueKeys: [String] = commonKeys.compactMap { key in
    guard let initialJsonValue = json[key], let initialPlistValue = plist[key] else {
        fatalError("Fix commonKeys")
    }
    
    // Replace all whitespace characters with a normal space
    let jsonValue = String(
        initialJsonValue.map { [=12=].isWhitespace ? " " : [=12=] }
    )
    
    let plistValue = initialPlistValue
        // Replace all * with a normal whitespace
        .replacingOccurrences(of: "*", with: " ")
        // Trim all whitespace characters from the beginning/end
        .trimmingCharacters(in: .whitespaces)
    
    return jsonValue == plistValue ? nil : key
}

print("Different values: \(differentValueKeys.count)")
differentValueKeys.forEach { key in
    print(" - \(key): JSON(\(json[key]!)) PLIST(\(plist[key]!))")
}

输出:

Languages missing in JSON: 1
 - es
Languages missing in PLIST: 1
 - en
Different values: 1
 - de: JSON(Straße) PLIST(Strasse)