如何比较swift中包含unicode字符的字符串是否相等?
How to compare string equality which contains unicode characters in swift?
在我的应用程序中,我试图比较来自具有 JSON 文件数量的回购的值,每个 JSON 文件将具有某些国家/地区的值作为字典,例如:
{cz: "Doplňky k Apple TV"
dk: "Apple TVtilbehør" } //string1 == "Doplňky k Apple TV"
类似的,我有一个本地 plist,它也有相同国家的字典,例如:
{cz: "Doplňky*k*Apple*TV"
dk: "*Apple*TV*Tilbehør*" } //string2 == "Doplňky*k*Apple*TV"
所以,基本上我需要比较每个国家/地区的值,然后只向用户显示差异。
所以,在这种情况下,JSON file(string1) 和本地 plist(string2) 中的 cz
值除了 string2 中有星号之外是相同的。当我只是删除星号并比较字符串时,它们仍然不匹配,因为 Doplňky k Apple TV
在 string1
中的 Apple
之后有不可见的 unicode space 看起来像白色 space.
下面是我实现逻辑的代码:
if string2.replaceString(["*", "\u{00a0}"], " ").trimmingCharacters(in: .whitespaces) == string1.replacingOccurrences(of: "\u{00a0}", with: " "){
//Do something
}
Doplňky k Apple TV
字符串看起来像是来自 Apple 网站。当我在他们的网站上查看时,这个字符串包含 NO-BREAK SPACE (U+00A0) between Apple
& TV
. It's a white space character, but it doesn't equal to a normal SPACE (U+0020).
"Doplňky k Apple\u{00a0}TV" == "Doplňky k Apple TV" // false
首先要说明的是 - 这重要吗?我们应该平等对待还是不平等对待?
然后你有 Apple TVtilbehør
& *Apple*TV*Tilbehør*
个字符串。是故意的错字吗?或者 Apple TVtilbehør
应该是 Apple TV Tilbehør
?让我们假设它是故意的错字来测试你的比较。
接下来,*Apple*TV*Tilbehør*
字符串中的这些*
(在beginning/end处)是...? 第二件事要说明 - 我们应该忽略它们吗?它们代表空格吗?
接下来是 Unicode equivalence. How would you like to compare these two strings? Swift helps you here (source):
Comparing strings for equality using the equal-to operator (==
) or a relational operator (like <
or >=
) is always performed using Unicode canonical representation. As a result, different representations of a string compare as being equal.
"Cafe\u{301}" == "Café" // true
其他国家呢?就像 Straße
等于 Strasse
的德国? 第三点要说明 - 我们应该如何处理这些字符串?
如您所见,有很多事情需要考虑。你有规格吗?跟着它。没有规范?你的算法迟早会停止工作。
游乐场
我冒昧地自己指定了所有这些东西:
- 所有空格都相等
*
在 beginning/end 没关系(忽略)
Straße
不等于 Strasse
示例代码:
import Foundation
let json = [
// U+00A0 is NO-BREAK SPACE which looks like a normal space (U+0020)
"cz": "Doplňky k Apple\u{00a0}TV",
"dk": "Apple TV Tilbehør",
"en": "Hello",
"de": "Straße",
"fr": "Expos\u{00E9}" // Exposé
]
let plist = [
"cz": "Doplňky*k*Apple*TV",
"dk": "*Apple*TV*Tilbehør*",
"es": "Hola",
"de": "Strasse",
"fr": "Expose\u{0301}" // Exposé
]
let jsonKeys = Set(json.keys)
let plistKeys = Set(plist.keys)
let commonKeys = jsonKeys.intersection(plistKeys)
let keysMissingInJson = plistKeys.subtracting(jsonKeys)
let keysMissingInPlist = jsonKeys.subtracting(plistKeys)
print("Languages missing in JSON: \(keysMissingInJson.count)")
keysMissingInJson.forEach { key in
print(" - \(key)")
}
print("Languages missing in PLIST: \(keysMissingInPlist.count)")
keysMissingInPlist.forEach { key in
print(" - \(key)")
}
let differentValueKeys: [String] = commonKeys.compactMap { key in
guard let initialJsonValue = json[key], let initialPlistValue = plist[key] else {
fatalError("Fix commonKeys")
}
// Replace all whitespace characters with a normal space
let jsonValue = String(
initialJsonValue.map { [=12=].isWhitespace ? " " : [=12=] }
)
let plistValue = initialPlistValue
// Replace all * with a normal whitespace
.replacingOccurrences(of: "*", with: " ")
// Trim all whitespace characters from the beginning/end
.trimmingCharacters(in: .whitespaces)
return jsonValue == plistValue ? nil : key
}
print("Different values: \(differentValueKeys.count)")
differentValueKeys.forEach { key in
print(" - \(key): JSON(\(json[key]!)) PLIST(\(plist[key]!))")
}
输出:
Languages missing in JSON: 1
- es
Languages missing in PLIST: 1
- en
Different values: 1
- de: JSON(Straße) PLIST(Strasse)
在我的应用程序中,我试图比较来自具有 JSON 文件数量的回购的值,每个 JSON 文件将具有某些国家/地区的值作为字典,例如:
{cz: "Doplňky k Apple TV"
dk: "Apple TVtilbehør" } //string1 == "Doplňky k Apple TV"
类似的,我有一个本地 plist,它也有相同国家的字典,例如:
{cz: "Doplňky*k*Apple*TV"
dk: "*Apple*TV*Tilbehør*" } //string2 == "Doplňky*k*Apple*TV"
所以,基本上我需要比较每个国家/地区的值,然后只向用户显示差异。
所以,在这种情况下,JSON file(string1) 和本地 plist(string2) 中的 cz
值除了 string2 中有星号之外是相同的。当我只是删除星号并比较字符串时,它们仍然不匹配,因为 Doplňky k Apple TV
在 string1
中的 Apple
之后有不可见的 unicode space 看起来像白色 space.
下面是我实现逻辑的代码:
if string2.replaceString(["*", "\u{00a0}"], " ").trimmingCharacters(in: .whitespaces) == string1.replacingOccurrences(of: "\u{00a0}", with: " "){
//Do something
}
Doplňky k Apple TV
字符串看起来像是来自 Apple 网站。当我在他们的网站上查看时,这个字符串包含 NO-BREAK SPACE (U+00A0) between Apple
& TV
. It's a white space character, but it doesn't equal to a normal SPACE (U+0020).
"Doplňky k Apple\u{00a0}TV" == "Doplňky k Apple TV" // false
首先要说明的是 - 这重要吗?我们应该平等对待还是不平等对待?
然后你有 Apple TVtilbehør
& *Apple*TV*Tilbehør*
个字符串。是故意的错字吗?或者 Apple TVtilbehør
应该是 Apple TV Tilbehør
?让我们假设它是故意的错字来测试你的比较。
接下来,*Apple*TV*Tilbehør*
字符串中的这些*
(在beginning/end处)是...? 第二件事要说明 - 我们应该忽略它们吗?它们代表空格吗?
接下来是 Unicode equivalence. How would you like to compare these two strings? Swift helps you here (source):
Comparing strings for equality using the equal-to operator (
==
) or a relational operator (like<
or>=
) is always performed using Unicode canonical representation. As a result, different representations of a string compare as being equal.
"Cafe\u{301}" == "Café" // true
其他国家呢?就像 Straße
等于 Strasse
的德国? 第三点要说明 - 我们应该如何处理这些字符串?
如您所见,有很多事情需要考虑。你有规格吗?跟着它。没有规范?你的算法迟早会停止工作。
游乐场
我冒昧地自己指定了所有这些东西:
- 所有空格都相等
*
在 beginning/end 没关系(忽略)Straße
不等于Strasse
示例代码:
import Foundation
let json = [
// U+00A0 is NO-BREAK SPACE which looks like a normal space (U+0020)
"cz": "Doplňky k Apple\u{00a0}TV",
"dk": "Apple TV Tilbehør",
"en": "Hello",
"de": "Straße",
"fr": "Expos\u{00E9}" // Exposé
]
let plist = [
"cz": "Doplňky*k*Apple*TV",
"dk": "*Apple*TV*Tilbehør*",
"es": "Hola",
"de": "Strasse",
"fr": "Expose\u{0301}" // Exposé
]
let jsonKeys = Set(json.keys)
let plistKeys = Set(plist.keys)
let commonKeys = jsonKeys.intersection(plistKeys)
let keysMissingInJson = plistKeys.subtracting(jsonKeys)
let keysMissingInPlist = jsonKeys.subtracting(plistKeys)
print("Languages missing in JSON: \(keysMissingInJson.count)")
keysMissingInJson.forEach { key in
print(" - \(key)")
}
print("Languages missing in PLIST: \(keysMissingInPlist.count)")
keysMissingInPlist.forEach { key in
print(" - \(key)")
}
let differentValueKeys: [String] = commonKeys.compactMap { key in
guard let initialJsonValue = json[key], let initialPlistValue = plist[key] else {
fatalError("Fix commonKeys")
}
// Replace all whitespace characters with a normal space
let jsonValue = String(
initialJsonValue.map { [=12=].isWhitespace ? " " : [=12=] }
)
let plistValue = initialPlistValue
// Replace all * with a normal whitespace
.replacingOccurrences(of: "*", with: " ")
// Trim all whitespace characters from the beginning/end
.trimmingCharacters(in: .whitespaces)
return jsonValue == plistValue ? nil : key
}
print("Different values: \(differentValueKeys.count)")
differentValueKeys.forEach { key in
print(" - \(key): JSON(\(json[key]!)) PLIST(\(plist[key]!))")
}
输出:
Languages missing in JSON: 1
- es
Languages missing in PLIST: 1
- en
Different values: 1
- de: JSON(Straße) PLIST(Strasse)