如何基于字符串检测数据中的编码?

How to detect encoding in Data based on a String?

我正在加载一个文本文件,编码未知,因为它来自其他来源。内容本身来自 macOS NSDocument 的 read 方法,它被输入到我的模型的读取中。 String constructor requires the encoding when using Data, if you assume the incorrect one you may get a null. I've created a conditional cascade of potential encodings (it's what other people seem to be doing),一定有更好的方法来做到这一点。建议?

    override func read(from data: Data, ofType typeName: String) throws {
        model.read(from: data, ofType: typeName)
    }

模型中:

    func read(from data: Data, ofType typeName: String) {
        if let text = String(data: data, encoding: .utf8) {
            content = text
        } else if let text = String(data: data, encoding: .macOSRoman) {
            content = text
        } else if let text = String(data: data, encoding: .ascii) {
            content = text
        } else {
            content = "?????"
        }
    }

您可以扩展 Data 并创建 stringEncoding 属性 来尝试检测字符串编码。像这样尝试:

extension Data {
    var stringEncoding: String.Encoding? {
        var nsString: NSString?
        guard case let rawValue = NSString.stringEncoding(for: self, encodingOptions: nil, convertedString: &nsString, usedLossyConversion: nil), rawValue != 0 else { return nil }
        return .init(rawValue: rawValue)
    }
}

然后您可以简单地将 data.stringEncoding 传递给字符串初始值:

if let string = String(data: data, encoding: data.stringEncoding) {
    print(string)
}