在 Swift 中使用 Unicode 代码点

Working with Unicode code points in Swift

如果您对蒙古语的细节不感兴趣,只是想快速了解一下 Swift 中 Unicode 值的使用和转换,请跳至 的第一部分。


背景

我想为 traditional Mongolian to be used in iOS apps. The better and long term solution is to use an AAT smart font that would render this complex script. (Such fonts do exist 渲染 Unicode 文本,但他们的许可证不允许修改和非个人使用。)但是,由于我从未制作过字体,更不用说所有渲染逻辑了AAT 字体,我现在只是打算自己在 Swift 中进行渲染。也许以后我可以学习制作智能字体。

在外部我将使用 Unicode 文本,但在内部(为了在 UITextView 中显示)我会将 Unicode 转换为以哑字体存储的单个字形(使用 Unicode PUA values). So my rendering engine needs to convert Mongolian Unicode values (range: U+1820 to U+1842) to glyph values stored in the PUA (range: U+E360 to U+E5CF). Anyway, this is my plan since it is what I did in Java in the past 编码,但也许我需要改变我的整个思维方式。

例子

下图显示了 su 在蒙古语中用两种不同的形式写了两次 u (红色)。 (蒙古文是竖写的,字母是连在一起的,就像英文的草书一样。)

在 Unicode 中,这两个字符串将表示为

var suForm1: String = "\u{1830}\u{1826}"
var suForm2: String = "\u{1830}\u{1826}\u{180B}"

suForm2 中的自由变化选择器 (U+180B) 被 Swift String 识别(正确地)为 u[= 的一个单位75=] (U+1826) 在它之前。它被 Swift 认为是一个单独的字符,一个扩展的字形簇。但是,为了自己进行渲染,我需要将 u (U+1826) 和 FVS1 (U+180B) 区分为两个不同的 UTF-16 代码点。

出于内部显示目的,我会将上述 Unicode 字符串转换为以下呈现的字形字符串:

suForm1 = "\u{E46F}\u{E3BA}" 
suForm2 = "\u{E46F}\u{E3BB}"

问题

我一直在玩 Swift StringCharacter。它们有很多方便之处,但由于在我的特定情况下我专门处理 UTF-16 代码单元,我想知道我是否应该使用旧的 NSString 而不是 Swift 的 String。我意识到我可以使用 String.utf16 来获取 UTF-16 代码点,但是 the conversion back to String isn't very nice.

坚持使用 String and Character or should I use NSString and unichar 会更好吗?

我读了什么

为了清理页面,已隐藏对此问题的更新。查看编辑历史。

更新 Swift 3

字符串和字符

对于以后访问此问题的几乎每个人来说,String and Character 将是您的答案。

直接在代码中设置 Unicode 值:

var str: String = "I want to visit 北京, Москва, मुंबई, القاهرة, and 서울시. "
var character: Character = ""

使用十六进制设置值

var str: String = "\u{61}\u{5927}\u{1F34E}\u{3C0}" // a大π
var character: Character = "\u{65}\u{301}" // é = "e" + accent mark

请注意,Swift 字符可以由多个 Unicode 代码点组成,但看起来是一个字符。这称为扩展字素簇。

另见

转换为 Unicode 值:

str.utf8
str.utf16
str.unicodeScalars // UTF-32

String(character).utf8
String(character).utf16
String(character).unicodeScalars

从 Unicode 十六进制值转换:

let hexValue: UInt32 = 0x1F34E

// convert hex value to UnicodeScalar
guard let scalarValue = UnicodeScalar(hexValue) else {
    // early exit if hex does not form a valid unicode value
    return
}

// convert UnicodeScalar to String
let myString = String(scalarValue) // 

或者:

let hexValue: UInt32 = 0x1F34E
if let scalarValue = UnicodeScalar(hexValue) {
    let myString = String(scalarValue)
}

再举几个例子

let value0: UInt8 = 0x61
let value1: UInt16 = 0x5927
let value2: UInt32 = 0x1F34E

let string0 = String(UnicodeScalar(value0)) // a
let string1 = String(UnicodeScalar(value1)) // 大
let string2 = String(UnicodeScalar(value2)) // 

// convert hex array to String
let myHexArray = [0x43, 0x61, 0x74, 0x203C, 0x1F431] // an Int array
var myString = ""
for hexValue in myHexArray {
    myString.append(UnicodeScalar(hexValue))
}
print(myString) // Cat‼

请注意,对于 UTF-8 和 UTF-16,转换并不总是那么容易。 (参见 UTF-8, UTF-16, and 个问题。)

NSString 和 unichar

也可以在Swift中使用NSStringunichar,但是你应该意识到,除非你熟悉Objective C并且善于转换Swift 的语法,很难找到好的文档。

此外,unichar 是一个 UInt16 数组,如上所述,从 UInt16 到 Unicode 标量值的转换并不总是那么容易(即,将代理对转换为表情符号之类的东西和上层代码平面中的其他字符)。

自定义字符串结构

由于问题中提到的原因,我最终没有使用上述任何一种方法。相反,我编写了自己的字符串结构,它基本上是一个 UInt32 数组来保存 Unicode 标量值。

同样,这不是大多数人的解决方案。如果您只需要稍微扩展 StringCharacter 的功能,请首先考虑使用 extensions

但如果您确实需要专门使用 Unicode 标量值,则可以编写自定义结构。

优点是:

  • 在进行字符串操作时不需要不断地在类型(StringCharacterUnicodeScalarUInt32 等之间切换。
  • Unicode 操作完成后,最终转换为 String 很容易。
  • 在需要时轻松添加更多方法
  • 简化从 Java 或其他语言
  • 转换代码

缺点是:

  • 使其他 Swift 开发人员的代码可移植性和可读性降低
  • 测试和优化不如原生 Swift 类型
  • 这是每次需要时都必须包含在项目中的另一个文件

你可以自己制作,但这里是我的供参考。最难的部分是 .

// This struct is an array of UInt32 to hold Unicode scalar values
// Version 3.4.0 (Swift 3 update)


struct ScalarString: Sequence, Hashable, CustomStringConvertible {
    
    fileprivate var scalarArray: [UInt32] = []
    
    
    init() {
        // does anything need to go here?
    }
    
    init(_ character: UInt32) {
        self.scalarArray.append(character)
    }
    
    init(_ charArray: [UInt32]) {
        for c in charArray {
            self.scalarArray.append(c)
        }
    }
    
    init(_ string: String) {
        
        for s in string.unicodeScalars {
            self.scalarArray.append(s.value)
        }
    }
    
    // Generator in order to conform to SequenceType protocol
    // (to allow users to iterate as in `for myScalarValue in myScalarString` { ... })
    func makeIterator() -> AnyIterator<UInt32> {
        return AnyIterator(scalarArray.makeIterator())
    }
    
    // append
    mutating func append(_ scalar: UInt32) {
        self.scalarArray.append(scalar)
    }
    
    mutating func append(_ scalarString: ScalarString) {
        for scalar in scalarString {
            self.scalarArray.append(scalar)
        }
    }
    
    mutating func append(_ string: String) {
        for s in string.unicodeScalars {
            self.scalarArray.append(s.value)
        }
    }
    
    // charAt
    func charAt(_ index: Int) -> UInt32 {
        return self.scalarArray[index]
    }
    
    // clear
    mutating func clear() {
        self.scalarArray.removeAll(keepingCapacity: true)
    }
    
    // contains
    func contains(_ character: UInt32) -> Bool {
        for scalar in self.scalarArray {
            if scalar == character {
                return true
            }
        }
        return false
    }
    
    // description (to implement Printable protocol)
    var description: String {
        return self.toString()
    }
    
    // endsWith
    func endsWith() -> UInt32? {
        return self.scalarArray.last
    }
    
    // indexOf
    // returns first index of scalar string match
    func indexOf(_ string: ScalarString) -> Int? {
        
        if scalarArray.count < string.length {
            return nil
        }
        
        for i in 0...(scalarArray.count - string.length) {
            
            for j in 0..<string.length {
                
                if string.charAt(j) != scalarArray[i + j] {
                    break // substring mismatch
                }
                if j == string.length - 1 {
                    return i
                }
            }
        }
        
        return nil
    }
    
    // insert
    mutating func insert(_ scalar: UInt32, atIndex index: Int) {
        self.scalarArray.insert(scalar, at: index)
    }
    mutating func insert(_ string: ScalarString, atIndex index: Int) {
        var newIndex = index
        for scalar in string {
            self.scalarArray.insert(scalar, at: newIndex)
            newIndex += 1
        }
    }
    mutating func insert(_ string: String, atIndex index: Int) {
        var newIndex = index
        for scalar in string.unicodeScalars {
            self.scalarArray.insert(scalar.value, at: newIndex)
            newIndex += 1
        }
    }
    
    // isEmpty
    var isEmpty: Bool {
        return self.scalarArray.count == 0
    }
    
    // hashValue (to implement Hashable protocol)
    var hashValue: Int {
        
        // DJB Hash Function
        return self.scalarArray.reduce(5381) {
            ([=16=] << 5) &+ [=16=] &+ Int()
        }
    }
    
    // length
    var length: Int {
        return self.scalarArray.count
    }
    
    // remove character
    mutating func removeCharAt(_ index: Int) {
        self.scalarArray.remove(at: index)
    }
    func removingAllInstancesOfChar(_ character: UInt32) -> ScalarString {
        
        var returnString = ScalarString()
        
        for scalar in self.scalarArray {
            if scalar != character {
                returnString.append(scalar)
            }
        }
        
        return returnString
    }
    func removeRange(_ range: CountableRange<Int>) -> ScalarString? {
        
        if range.lowerBound < 0 || range.upperBound > scalarArray.count {
            return nil
        }
        
        var returnString = ScalarString()
        
        for i in 0..<scalarArray.count {
            if i < range.lowerBound || i >= range.upperBound {
                returnString.append(scalarArray[i])
            }
        }
        
        return returnString
    }
    
    
    // replace
    func replace(_ character: UInt32, withChar replacementChar: UInt32) -> ScalarString {
        
        var returnString = ScalarString()
        
        for scalar in self.scalarArray {
            if scalar == character {
                returnString.append(replacementChar)
            } else {
                returnString.append(scalar)
            }
        }
        return returnString
    }
    func replace(_ character: UInt32, withString replacementString: String) -> ScalarString {
        
        var returnString = ScalarString()
        
        for scalar in self.scalarArray {
            if scalar == character {
                returnString.append(replacementString)
            } else {
                returnString.append(scalar)
            }
        }
        return returnString
    }
    func replaceRange(_ range: CountableRange<Int>, withString replacementString: ScalarString) -> ScalarString {
        
        var returnString = ScalarString()
        
        for i in 0..<scalarArray.count {
            if i < range.lowerBound || i >= range.upperBound {
                returnString.append(scalarArray[i])
            } else if i == range.lowerBound {
                returnString.append(replacementString)
            }
        }
        return returnString
    }
    
    // set (an alternative to myScalarString = "some string")
    mutating func set(_ string: String) {
        self.scalarArray.removeAll(keepingCapacity: false)
        for s in string.unicodeScalars {
            self.scalarArray.append(s.value)
        }
    }
    
    // split
    func split(atChar splitChar: UInt32) -> [ScalarString] {
        var partsArray: [ScalarString] = []
        if self.scalarArray.count == 0 {
            return partsArray
        }
        var part: ScalarString = ScalarString()
        for scalar in self.scalarArray {
            if scalar == splitChar {
                partsArray.append(part)
                part = ScalarString()
            } else {
                part.append(scalar)
            }
        }
        partsArray.append(part)
        return partsArray
    }
    
    // startsWith
    func startsWith() -> UInt32? {
        return self.scalarArray.first
    }
    
    // substring
    func substring(_ startIndex: Int) -> ScalarString {
        // from startIndex to end of string
        var subArray: ScalarString = ScalarString()
        for i in startIndex..<self.length {
            subArray.append(self.scalarArray[i])
        }
        return subArray
    }
    func substring(_ startIndex: Int, _ endIndex: Int) -> ScalarString {
        // (startIndex is inclusive, endIndex is exclusive)
        var subArray: ScalarString = ScalarString()
        for i in startIndex..<endIndex {
            subArray.append(self.scalarArray[i])
        }
        return subArray
    }
    
    // toString
    func toString() -> String {
        var string: String = ""
        
        for scalar in self.scalarArray {
            if let validScalor = UnicodeScalar(scalar) {
                string.append(Character(validScalor))
            }
        }
        return string
    }
    
    // trim
    // removes leading and trailing whitespace (space, tab, newline)
    func trim() -> ScalarString {
        
        //var returnString = ScalarString()
        let space: UInt32 = 0x00000020
        let tab: UInt32 = 0x00000009
        let newline: UInt32 = 0x0000000A
        
        var startIndex = self.scalarArray.count
        var endIndex = 0
        
        // leading whitespace
        for i in 0..<self.scalarArray.count {
            if self.scalarArray[i] != space &&
                self.scalarArray[i] != tab &&
                self.scalarArray[i] != newline {
                
                startIndex = i
                break
            }
        }
        
        // trailing whitespace
        for i in stride(from: (self.scalarArray.count - 1), through: 0, by: -1) {
            if self.scalarArray[i] != space &&
                self.scalarArray[i] != tab &&
                self.scalarArray[i] != newline {
                
                endIndex = i + 1
                break
            }
        }
        
        if endIndex <= startIndex {
            return ScalarString()
        }
        
        return self.substring(startIndex, endIndex)
    }
    
    // values
    func values() -> [UInt32] {
        return self.scalarArray
    }
    
}

func ==(left: ScalarString, right: ScalarString) -> Bool {
    return left.scalarArray == right.scalarArray
}

func +(left: ScalarString, right: ScalarString) -> ScalarString {
    var returnString = ScalarString()
    for scalar in left.values() {
        returnString.append(scalar)
    }
    for scalar in right.values() {
        returnString.append(scalar)
    }
    return returnString
}
//Swift 3.0  
// This struct is an array of UInt32 to hold Unicode scalar values
struct ScalarString: Sequence, Hashable, CustomStringConvertible {

    private var scalarArray: [UInt32] = []

    init() {
        // does anything need to go here?
    }

    init(_ character: UInt32) {
        self.scalarArray.append(character)
    }

    init(_ charArray: [UInt32]) {
        for c in charArray {
            self.scalarArray.append(c)
        }
    }

    init(_ string: String) {

        for s in string.unicodeScalars {
            self.scalarArray.append(s.value)
        }
    }

    // Generator in order to conform to SequenceType protocol
    // (to allow users to iterate as in `for myScalarValue in myScalarString` { ... })

    //func generate() -> AnyIterator<UInt32> {
    func makeIterator() -> AnyIterator<UInt32> {

        let nextIndex = 0

        return AnyIterator {
            if (nextIndex > self.scalarArray.count-1) {
                return nil
            }
            return self.scalarArray[nextIndex + 1]
        }
    }

    // append
    mutating func append(scalar: UInt32) {
        self.scalarArray.append(scalar)
    }

    mutating func append(scalarString: ScalarString) {
        for scalar in scalarString {
            self.scalarArray.append(scalar)
        }
    }

    mutating func append(string: String) {
        for s in string.unicodeScalars {
            self.scalarArray.append(s.value)
        }
    }

    // charAt
    func charAt(index: Int) -> UInt32 {
        return self.scalarArray[index]
    }

    // clear
    mutating func clear() {
        self.scalarArray.removeAll(keepingCapacity: true)
    }

    // contains
    func contains(character: UInt32) -> Bool {
        for scalar in self.scalarArray {
            if scalar == character {
                return true
            }
        }
        return false
    }

    // description (to implement Printable protocol)
    var description: String {

        var string: String = ""

        for scalar in scalarArray {
            string.append(String(describing: UnicodeScalar(scalar))) //.append(UnicodeScalar(scalar)!)
        }
        return string
    }

    // endsWith
    func endsWith() -> UInt32? {
        return self.scalarArray.last
    }

    // insert
    mutating func insert(scalar: UInt32, atIndex index: Int) {
        self.scalarArray.insert(scalar, at: index)
    }

    // isEmpty
    var isEmpty: Bool {
        get {
            return self.scalarArray.count == 0
        }
    }

    // hashValue (to implement Hashable protocol)
    var hashValue: Int {
        get {

            // DJB Hash Function
            var hash = 5381

            for i in 0 ..< scalarArray.count {
                hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])
            }
            /*
             for i in 0..< self.scalarArray.count {
             hash = ((hash << 5) &+ hash) &+ Int(self.scalarArray[i])
             }
             */
            return hash
        }
    }

    // length
    var length: Int {
        get {
            return self.scalarArray.count
        }
    }

    // remove character
    mutating func removeCharAt(index: Int) {
        self.scalarArray.remove(at: index)
    }
    func removingAllInstancesOfChar(character: UInt32) -> ScalarString {

        var returnString = ScalarString()

        for scalar in self.scalarArray {
            if scalar != character {
                returnString.append(scalar: scalar) //.append(scalar)
            }
        }

        return returnString
    }

    // replace
    func replace(character: UInt32, withChar replacementChar: UInt32) -> ScalarString {

        var returnString = ScalarString()

        for scalar in self.scalarArray {
            if scalar == character {
                returnString.append(scalar: replacementChar) //.append(replacementChar)
            } else {
                returnString.append(scalar: scalar) //.append(scalar)
            }
        }
        return returnString
    }

    // func replace(character: UInt32, withString replacementString: String) -> ScalarString {
    func replace(character: UInt32, withString replacementString: ScalarString) -> ScalarString {

        var returnString = ScalarString()

        for scalar in self.scalarArray {
            if scalar == character {
                returnString.append(scalarString: replacementString) //.append(replacementString)
            } else {
                returnString.append(scalar: scalar) //.append(scalar)
            }
        }
        return returnString
    }

    // set (an alternative to myScalarString = "some string")
    mutating func set(string: String) {
        self.scalarArray.removeAll(keepingCapacity: false)
        for s in string.unicodeScalars {
            self.scalarArray.append(s.value)
        }
    }

    // split
    func split(atChar splitChar: UInt32) -> [ScalarString] {
        var partsArray: [ScalarString] = []
        var part: ScalarString = ScalarString()
        for scalar in self.scalarArray {
            if scalar == splitChar {
                partsArray.append(part)
                part = ScalarString()
            } else {
                part.append(scalar: scalar) //.append(scalar)
            }
        }
        partsArray.append(part)
        return partsArray
    }

    // startsWith
    func startsWith() -> UInt32? {
        return self.scalarArray.first
    }

    // substring
    func substring(startIndex: Int) -> ScalarString {
        // from startIndex to end of string
        var subArray: ScalarString = ScalarString()
        for i in startIndex ..< self.length {
            subArray.append(scalar: self.scalarArray[i]) //.append(self.scalarArray[i])
        }
        return subArray
    }
    func substring(startIndex: Int, _ endIndex: Int) -> ScalarString {
        // (startIndex is inclusive, endIndex is exclusive)
        var subArray: ScalarString = ScalarString()
        for i in startIndex ..< endIndex {
            subArray.append(scalar: self.scalarArray[i]) //.append(self.scalarArray[i])
        }
        return subArray
    }

    // toString
    func toString() -> String {
        let string: String = ""

        for scalar in self.scalarArray {
            string.appending(String(describing:UnicodeScalar(scalar))) //.append(UnicodeScalar(scalar)!)
        }
        return string
    }

    // values
    func values() -> [UInt32] {
        return self.scalarArray
    }

}

func ==(left: ScalarString, right: ScalarString) -> Bool {

    if left.length != right.length {
        return false
    }

    for i in 0 ..< left.length {
        if left.charAt(index: i) != right.charAt(index: i) {
            return false
        }
    }

    return true
}

func +(left: ScalarString, right: ScalarString) -> ScalarString {
    var returnString = ScalarString()
    for scalar in left.values() {
        returnString.append(scalar: scalar) //.append(scalar)
    }
    for scalar in right.values() {
        returnString.append(scalar: scalar) //.append(scalar)
    }
    return returnString
}