CharacterSet.contains() 方法中的奇怪行为,高位 UTF8 字符与 ASCII 混合
Strange Behavior In CharacterSet.contains() Method, With High UTF8 Characters Mixed With ASCII
事情是这样的:我正在创建一个 StringProtocol 扩展以添加基于字符集进行拆分的能力(该字符集中的任何字符都用于拆分贪婪比较)。
问题是我在与同时具有少量 ASCII 字符和大量 UTF8 字符的字符集进行比较时遇到困难。
如果我只提供 UTF8 high 或 ASCII,则匹配正常。
我创建了一个 playground 来说明这一点。
奇怪的结果是倒数第二个打印输出(“Test String 2 does not have a space or a joker.
”)。那应该说“是”。
问题是 CharacterSet 中的 space 匹配,但 joker 卡不匹配。
有什么想法吗?这是游乐场:
import Foundation
public extension StringProtocol {
func containsOneOfThese(_ inCharacterset: CharacterSet) -> Bool {
self.contains { (char) in
char.unicodeScalars.contains { (scalar) in inCharacterset.contains(scalar) }
}
}
}
let space = " "
let joker = ""
let both = space + joker
let spadesNumberCards = ""
let spadesFaceCards = ""
let testString1 = spadesNumberCards + space + spadesFaceCards
let testString2 = spadesNumberCards + joker + spadesFaceCards
let testString3 = spadesNumberCards + both + spadesFaceCards
print("These Are The Strings We Are Testing:\n")
print("Test String 1: \"\(testString1)\"")
print("Test String 2: \"\(testString2)\"")
print("Test String 3: \"\(testString3)\"")
print("\nFirst, See If Any Of the Strings Contain Spaces:\n")
print("Test String 1 does \(testString1.containsOneOfThese(CharacterSet(charactersIn: space)) ? "" : "not ")have a space.")
print("Test String 2 does \(testString2.containsOneOfThese(CharacterSet(charactersIn: space)) ? "" : "not ")have a space.")
print("Test String 3 does \(testString3.containsOneOfThese(CharacterSet(charactersIn: space)) ? "" : "not ")have a space.")
print("\nNext, See If Any Of the Strings Contain Jokers:\n")
print("Test String 1 does \(testString1.containsOneOfThese(CharacterSet(charactersIn: joker)) ? "" : "not ")have a joker.")
print("Test String 2 does \(testString2.containsOneOfThese(CharacterSet(charactersIn: joker)) ? "" : "not ")have a joker.")
print("Test String 3 does \(testString3.containsOneOfThese(CharacterSet(charactersIn: joker)) ? "" : "not ")have a joker.")
print("\nOK, Now it gets weird:\n")
print("Test String 1 does \(testString1.containsOneOfThese(CharacterSet(charactersIn: both)) ? "" : "not ")have a space or a joker.")
print("Test String 2 does \(testString2.containsOneOfThese(CharacterSet(charactersIn: both)) ? "" : "not ")have a space or a joker.")
print("Test String 3 does \(testString3.containsOneOfThese(CharacterSet(charactersIn: both)) ? "" : "not ")have a space or a joker.")
打印出:
These Are The Strings We Are Testing:
Test String 1: " "
Test String 2: ""
Test String 3: " "
First, See If Any Of the Strings Contain Spaces:
Test String 1 does have a space.
Test String 2 does not have a space.
Test String 3 does have a space.
Next, See If Any Of the Strings Contain Jokers:
Test String 1 does not have a joker.
Test String 2 does have a joker.
Test String 3 does have a joker.
OK, Now it gets weird:
Test String 1 does have a space or a joker.
Test String 2 does not have a space or a joker.
Test String 3 does have a space or a joker.
如果字符串包含BMP(基本多语言平面)内外的字符,CharacterSet.init(charactersIn string: String)
似乎无法正常工作:
let s = " "
let cs = CharacterSet(charactersIn: s)
s.unicodeScalars.forEach {
print(cs.contains([=10=]))
}
// Expected output: true, true
// Actual output: true, false
解决方法是改用从 Unicode 标量序列创建字符集:
let cs = CharacterSet(s.unicodeScalars)
这将产生预期的输出。
但请注意,这无法处理 Swift Character
的全部范围(包括由多个 Unicode 标量组成的字素簇)。因此,您可能希望使用 Set<Character>
代替。
事情是这样的:我正在创建一个 StringProtocol 扩展以添加基于字符集进行拆分的能力(该字符集中的任何字符都用于拆分贪婪比较)。
问题是我在与同时具有少量 ASCII 字符和大量 UTF8 字符的字符集进行比较时遇到困难。
如果我只提供 UTF8 high 或 ASCII,则匹配正常。
我创建了一个 playground 来说明这一点。
奇怪的结果是倒数第二个打印输出(“Test String 2 does not have a space or a joker.
”)。那应该说“是”。
问题是 CharacterSet 中的 space 匹配,但 joker 卡不匹配。
有什么想法吗?这是游乐场:
import Foundation
public extension StringProtocol {
func containsOneOfThese(_ inCharacterset: CharacterSet) -> Bool {
self.contains { (char) in
char.unicodeScalars.contains { (scalar) in inCharacterset.contains(scalar) }
}
}
}
let space = " "
let joker = ""
let both = space + joker
let spadesNumberCards = ""
let spadesFaceCards = ""
let testString1 = spadesNumberCards + space + spadesFaceCards
let testString2 = spadesNumberCards + joker + spadesFaceCards
let testString3 = spadesNumberCards + both + spadesFaceCards
print("These Are The Strings We Are Testing:\n")
print("Test String 1: \"\(testString1)\"")
print("Test String 2: \"\(testString2)\"")
print("Test String 3: \"\(testString3)\"")
print("\nFirst, See If Any Of the Strings Contain Spaces:\n")
print("Test String 1 does \(testString1.containsOneOfThese(CharacterSet(charactersIn: space)) ? "" : "not ")have a space.")
print("Test String 2 does \(testString2.containsOneOfThese(CharacterSet(charactersIn: space)) ? "" : "not ")have a space.")
print("Test String 3 does \(testString3.containsOneOfThese(CharacterSet(charactersIn: space)) ? "" : "not ")have a space.")
print("\nNext, See If Any Of the Strings Contain Jokers:\n")
print("Test String 1 does \(testString1.containsOneOfThese(CharacterSet(charactersIn: joker)) ? "" : "not ")have a joker.")
print("Test String 2 does \(testString2.containsOneOfThese(CharacterSet(charactersIn: joker)) ? "" : "not ")have a joker.")
print("Test String 3 does \(testString3.containsOneOfThese(CharacterSet(charactersIn: joker)) ? "" : "not ")have a joker.")
print("\nOK, Now it gets weird:\n")
print("Test String 1 does \(testString1.containsOneOfThese(CharacterSet(charactersIn: both)) ? "" : "not ")have a space or a joker.")
print("Test String 2 does \(testString2.containsOneOfThese(CharacterSet(charactersIn: both)) ? "" : "not ")have a space or a joker.")
print("Test String 3 does \(testString3.containsOneOfThese(CharacterSet(charactersIn: both)) ? "" : "not ")have a space or a joker.")
打印出:
These Are The Strings We Are Testing:
Test String 1: " "
Test String 2: ""
Test String 3: " "
First, See If Any Of the Strings Contain Spaces:
Test String 1 does have a space.
Test String 2 does not have a space.
Test String 3 does have a space.
Next, See If Any Of the Strings Contain Jokers:
Test String 1 does not have a joker.
Test String 2 does have a joker.
Test String 3 does have a joker.
OK, Now it gets weird:
Test String 1 does have a space or a joker.
Test String 2 does not have a space or a joker.
Test String 3 does have a space or a joker.
如果字符串包含BMP(基本多语言平面)内外的字符,CharacterSet.init(charactersIn string: String)
似乎无法正常工作:
let s = " "
let cs = CharacterSet(charactersIn: s)
s.unicodeScalars.forEach {
print(cs.contains([=10=]))
}
// Expected output: true, true
// Actual output: true, false
解决方法是改用从 Unicode 标量序列创建字符集:
let cs = CharacterSet(s.unicodeScalars)
这将产生预期的输出。
但请注意,这无法处理 Swift Character
的全部范围(包括由多个 Unicode 标量组成的字素簇)。因此,您可能希望使用 Set<Character>
代替。