解码 Swift 中引用的可打印消息
Decoding quoted-printable messages in Swift
我有一个带引号的可打印字符串,例如 "The cost would be =C2=A31,000"。我如何将其转换为 "The cost would be £1,000".
我目前只是手动转换文本,这并不涵盖所有情况。我确信只有一行代码可以帮助解决这个问题。
这是我的代码:
func decodeUTF8(message: String) -> String
{
var newMessage = message.stringByReplacingOccurrencesOfString("=2E", withString: ".", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A2", withString: "•", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=C2=A3", withString: "£", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=A3", withString: "£", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9C", withString: "\"", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A6", withString: "…", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9D", withString: "\"", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=92", withString: "'", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=3D", withString: "=", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=20", withString: "", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=99", withString: "'", options: NSStringCompareOptions.LiteralSearch, range: nil)
return newMessage
}
谢谢
这种编码称为 'quoted-printable',您需要做的是使用 ASCII 编码将字符串转换为 NSData,然后迭代数据,用 byte/char 0xA3,然后使用NSUTF8StringEncoding.
将结果数据转换为字符串
为了给出适用的解决方案,还需要一些信息。所以,我会做一些假设。
例如,在 HTML 或邮件消息中,您可以将一种或多种编码应用于某种源数据。例如,您可以对二进制文件进行编码,例如一个带有 base64 的 png
文件,然后将其压缩。顺序很重要。
在您所说的示例中,源数据是一个字符串并且已通过 UTF-8 编码。
在 HTTP 消息中,您的 Content-Type
因此是 text/plain; charset = UTF-8
。在您的示例中,似乎还应用了额外的编码,
a "Content-Transfer-Encoding":可能 Content-transfer-encoding
是 quoted-printable
或 base64
(虽然不确定)。
为了还原它,您需要以相反的顺序应用相应的解码。
提示:
您可以在查看邮件的原始来源时查看邮件的 headers(Contente-type
和 Content-Transfer-Encoding
)。
一个简单的方法是使用 (NS)String
方法
stringByRemovingPercentEncoding
为此目的。
这是观察到的
decoding quoted-printables,
所以第一个解决方案主要是对答案的翻译
该线程 Swift.
想法是用 quoted-printable "=NN" 编码替换
percent encoding "%NN" 然后用现有的方法去掉
百分比编码。
续行单独处理。
此外,输入字符串中的百分比字符必须首先 encoded,
否则他们将被视为百分比中的主角
编码。
func decodeQuotedPrintable(message : String) -> String? {
return message
.stringByReplacingOccurrencesOfString("=\r\n", withString: "")
.stringByReplacingOccurrencesOfString("=\n", withString: "")
.stringByReplacingOccurrencesOfString("%", withString: "%25")
.stringByReplacingOccurrencesOfString("=", withString: "%")
.stringByRemovingPercentEncoding
}
函数 returns 一个可选字符串,它是 nil
表示无效输入。
无效输入可以是:
- 一个“=”字符,后面没有跟两个十六进制数字,
例如"=XX".
- 未解码为有效 UTF-8 序列的“=NN”序列,
例如"=E2=64".
示例:
if let decoded = decodeQuotedPrintable("=C2=A31,000") {
print(decoded) // £1,000
}
if let decoded = decodeQuotedPrintable("=E2=80=9CHello =E2=80=A6 world!=E2=80=9D") {
print(decoded) // “Hello … world!”
}
更新1:以上代码假设消息使用UTF-8
用于引用 non-ASCII 个字符的编码,如大多数示例所示:C2 A3
是“£”的 UTF-8 编码,E2 80 A4
是 …
的 UTF-8 编码.
如果输入是 "Rub=E9n"
则消息使用
Windows-1252编码。
要正确解码,您必须替换
.stringByRemovingPercentEncoding
来自
.stringByReplacingPercentEscapesUsingEncoding(NSWindowsCP1252StringEncoding)
还有一些方法可以从 "Content-Type" 中检测编码
header 字段,比较例如https://whosebug.com/a/32051684/1187415.
更新二: stringByReplacingPercentEscapesUsingEncoding
方法被标记为已弃用,因此上面的代码将始终生成
编译器警告。不幸的是,似乎没有替代方法
已由 Apple 提供。
所以这里有一个新的、完全 self-contained 的解码方法
不会引起任何编译器警告。这次我写了
作为 String
的扩展方法。解释评论在
代码。
extension String {
/// Returns a new string made by removing in the `String` all "soft line
/// breaks" and replacing all quoted-printable escape sequences with the
/// matching characters as determined by a given encoding.
/// - parameter encoding: A string encoding. The default is UTF-8.
/// - returns: The decoded string, or `nil` for invalid input.
func decodeQuotedPrintable(encoding enc : NSStringEncoding = NSUTF8StringEncoding) -> String? {
// Handle soft line breaks, then replace quoted-printable escape sequences.
return self
.stringByReplacingOccurrencesOfString("=\r\n", withString: "")
.stringByReplacingOccurrencesOfString("=\n", withString: "")
.decodeQuotedPrintableSequences(enc)
}
/// Helper function doing the real work.
/// Decode all "=HH" sequences with respect to the given encoding.
private func decodeQuotedPrintableSequences(enc : NSStringEncoding) -> String? {
var result = ""
var position = startIndex
// Find the next "=" and copy characters preceding it to the result:
while let range = rangeOfString("=", range: position ..< endIndex) {
result.appendContentsOf(self[position ..< range.startIndex])
position = range.startIndex
// Decode one or more successive "=HH" sequences to a byte array:
let bytes = NSMutableData()
repeat {
let hexCode = self[position.advancedBy(1) ..< position.advancedBy(3, limit: endIndex)]
if hexCode.characters.count < 2 {
return nil // Incomplete hex code
}
guard var byte = UInt8(hexCode, radix: 16) else {
return nil // Invalid hex code
}
bytes.appendBytes(&byte, length: 1)
position = position.advancedBy(3)
} while position != endIndex && self[position] == "="
// Convert the byte array to a string, and append it to the result:
guard let dec = String(data: bytes, encoding: enc) else {
return nil // Decoded bytes not valid in the given encoding
}
result.appendContentsOf(dec)
}
// Copy remaining characters to the result:
result.appendContentsOf(self[position ..< endIndex])
return result
}
}
用法示例:
if let decoded = "=C2=A31,000".decodeQuotedPrintable() {
print(decoded) // £1,000
}
if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {
print(decoded) // “Hello … world!”
}
if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: NSWindowsCP1252StringEncoding) {
print(decoded) // Rubén
}
Swift4(及更高版本)的更新:
extension String {
/// Returns a new string made by removing in the `String` all "soft line
/// breaks" and replacing all quoted-printable escape sequences with the
/// matching characters as determined by a given encoding.
/// - parameter encoding: A string encoding. The default is UTF-8.
/// - returns: The decoded string, or `nil` for invalid input.
func decodeQuotedPrintable(encoding enc : String.Encoding = .utf8) -> String? {
// Handle soft line breaks, then replace quoted-printable escape sequences.
return self
.replacingOccurrences(of: "=\r\n", with: "")
.replacingOccurrences(of: "=\n", with: "")
.decodeQuotedPrintableSequences(encoding: enc)
}
/// Helper function doing the real work.
/// Decode all "=HH" sequences with respect to the given encoding.
private func decodeQuotedPrintableSequences(encoding enc : String.Encoding) -> String? {
var result = ""
var position = startIndex
// Find the next "=" and copy characters preceding it to the result:
while let range = range(of: "=", range: position..<endIndex) {
result.append(contentsOf: self[position ..< range.lowerBound])
position = range.lowerBound
// Decode one or more successive "=HH" sequences to a byte array:
var bytes = Data()
repeat {
let hexCode = self[position...].dropFirst().prefix(2)
if hexCode.count < 2 {
return nil // Incomplete hex code
}
guard let byte = UInt8(hexCode, radix: 16) else {
return nil // Invalid hex code
}
bytes.append(byte)
position = index(position, offsetBy: 3)
} while position != endIndex && self[position] == "="
// Convert the byte array to a string, and append it to the result:
guard let dec = String(data: bytes, encoding: enc) else {
return nil // Decoded bytes not valid in the given encoding
}
result.append(contentsOf: dec)
}
// Copy remaining characters to the result:
result.append(contentsOf: self[position ..< endIndex])
return result
}
}
用法示例:
if let decoded = "=C2=A31,000".decodeQuotedPrintable() {
print(decoded) // £1,000
}
if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {
print(decoded) // “Hello … world!”
}
if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: .windowsCP1252) {
print(decoded) // Rubén
}
不幸的是,我的回答有点晚了。不过,这可能对其他人有所帮助。
var string = "The cost would be =C2=A31,000"
var finalString: String? = nil
if let regEx = try? NSRegularExpression(pattern: "={1}?([a-f0-9]{2}?)", options: NSRegularExpressionOptions.CaseInsensitive)
{
let intermediatePercentEscapedString = regEx.stringByReplacingMatchesInString(string, options: NSMatchingOptions.WithTransparentBounds, range: NSMakeRange(0, string.characters.count), withTemplate: "%")
print(intermediatePercentEscapedString)
finalString = intermediatePercentEscapedString.stringByRemovingPercentEncoding
print(finalString)
}
您也可以查看这个可行的解决方案 - https://github.com/dunkelstern/QuotedPrintable
let result = QuotedPrintable.decode(string: quoted)
我有一个带引号的可打印字符串,例如 "The cost would be =C2=A31,000"。我如何将其转换为 "The cost would be £1,000".
我目前只是手动转换文本,这并不涵盖所有情况。我确信只有一行代码可以帮助解决这个问题。
这是我的代码:
func decodeUTF8(message: String) -> String
{
var newMessage = message.stringByReplacingOccurrencesOfString("=2E", withString: ".", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A2", withString: "•", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=C2=A3", withString: "£", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=A3", withString: "£", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9C", withString: "\"", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=A6", withString: "…", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=9D", withString: "\"", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=92", withString: "'", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=3D", withString: "=", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=20", withString: "", options: NSStringCompareOptions.LiteralSearch, range: nil)
newMessage = newMessage.stringByReplacingOccurrencesOfString("=E2=80=99", withString: "'", options: NSStringCompareOptions.LiteralSearch, range: nil)
return newMessage
}
谢谢
这种编码称为 'quoted-printable',您需要做的是使用 ASCII 编码将字符串转换为 NSData,然后迭代数据,用 byte/char 0xA3,然后使用NSUTF8StringEncoding.
将结果数据转换为字符串为了给出适用的解决方案,还需要一些信息。所以,我会做一些假设。
例如,在 HTML 或邮件消息中,您可以将一种或多种编码应用于某种源数据。例如,您可以对二进制文件进行编码,例如一个带有 base64 的 png
文件,然后将其压缩。顺序很重要。
在您所说的示例中,源数据是一个字符串并且已通过 UTF-8 编码。
在 HTTP 消息中,您的 Content-Type
因此是 text/plain; charset = UTF-8
。在您的示例中,似乎还应用了额外的编码,
a "Content-Transfer-Encoding":可能 Content-transfer-encoding
是 quoted-printable
或 base64
(虽然不确定)。
为了还原它,您需要以相反的顺序应用相应的解码。
提示:
您可以在查看邮件的原始来源时查看邮件的 headers(Contente-type
和 Content-Transfer-Encoding
)。
一个简单的方法是使用 (NS)String
方法
stringByRemovingPercentEncoding
为此目的。
这是观察到的
decoding quoted-printables,
所以第一个解决方案主要是对答案的翻译
该线程 Swift.
想法是用 quoted-printable "=NN" 编码替换 percent encoding "%NN" 然后用现有的方法去掉 百分比编码。
续行单独处理。 此外,输入字符串中的百分比字符必须首先 encoded, 否则他们将被视为百分比中的主角 编码。
func decodeQuotedPrintable(message : String) -> String? {
return message
.stringByReplacingOccurrencesOfString("=\r\n", withString: "")
.stringByReplacingOccurrencesOfString("=\n", withString: "")
.stringByReplacingOccurrencesOfString("%", withString: "%25")
.stringByReplacingOccurrencesOfString("=", withString: "%")
.stringByRemovingPercentEncoding
}
函数 returns 一个可选字符串,它是 nil
表示无效输入。
无效输入可以是:
- 一个“=”字符,后面没有跟两个十六进制数字, 例如"=XX".
- 未解码为有效 UTF-8 序列的“=NN”序列, 例如"=E2=64".
示例:
if let decoded = decodeQuotedPrintable("=C2=A31,000") {
print(decoded) // £1,000
}
if let decoded = decodeQuotedPrintable("=E2=80=9CHello =E2=80=A6 world!=E2=80=9D") {
print(decoded) // “Hello … world!”
}
更新1:以上代码假设消息使用UTF-8
用于引用 non-ASCII 个字符的编码,如大多数示例所示:C2 A3
是“£”的 UTF-8 编码,E2 80 A4
是 …
的 UTF-8 编码.
如果输入是 "Rub=E9n"
则消息使用
Windows-1252编码。
要正确解码,您必须替换
.stringByRemovingPercentEncoding
来自
.stringByReplacingPercentEscapesUsingEncoding(NSWindowsCP1252StringEncoding)
还有一些方法可以从 "Content-Type" 中检测编码 header 字段,比较例如https://whosebug.com/a/32051684/1187415.
更新二: stringByReplacingPercentEscapesUsingEncoding
方法被标记为已弃用,因此上面的代码将始终生成
编译器警告。不幸的是,似乎没有替代方法
已由 Apple 提供。
所以这里有一个新的、完全 self-contained 的解码方法
不会引起任何编译器警告。这次我写了
作为 String
的扩展方法。解释评论在
代码。
extension String {
/// Returns a new string made by removing in the `String` all "soft line
/// breaks" and replacing all quoted-printable escape sequences with the
/// matching characters as determined by a given encoding.
/// - parameter encoding: A string encoding. The default is UTF-8.
/// - returns: The decoded string, or `nil` for invalid input.
func decodeQuotedPrintable(encoding enc : NSStringEncoding = NSUTF8StringEncoding) -> String? {
// Handle soft line breaks, then replace quoted-printable escape sequences.
return self
.stringByReplacingOccurrencesOfString("=\r\n", withString: "")
.stringByReplacingOccurrencesOfString("=\n", withString: "")
.decodeQuotedPrintableSequences(enc)
}
/// Helper function doing the real work.
/// Decode all "=HH" sequences with respect to the given encoding.
private func decodeQuotedPrintableSequences(enc : NSStringEncoding) -> String? {
var result = ""
var position = startIndex
// Find the next "=" and copy characters preceding it to the result:
while let range = rangeOfString("=", range: position ..< endIndex) {
result.appendContentsOf(self[position ..< range.startIndex])
position = range.startIndex
// Decode one or more successive "=HH" sequences to a byte array:
let bytes = NSMutableData()
repeat {
let hexCode = self[position.advancedBy(1) ..< position.advancedBy(3, limit: endIndex)]
if hexCode.characters.count < 2 {
return nil // Incomplete hex code
}
guard var byte = UInt8(hexCode, radix: 16) else {
return nil // Invalid hex code
}
bytes.appendBytes(&byte, length: 1)
position = position.advancedBy(3)
} while position != endIndex && self[position] == "="
// Convert the byte array to a string, and append it to the result:
guard let dec = String(data: bytes, encoding: enc) else {
return nil // Decoded bytes not valid in the given encoding
}
result.appendContentsOf(dec)
}
// Copy remaining characters to the result:
result.appendContentsOf(self[position ..< endIndex])
return result
}
}
用法示例:
if let decoded = "=C2=A31,000".decodeQuotedPrintable() {
print(decoded) // £1,000
}
if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {
print(decoded) // “Hello … world!”
}
if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: NSWindowsCP1252StringEncoding) {
print(decoded) // Rubén
}
Swift4(及更高版本)的更新:
extension String {
/// Returns a new string made by removing in the `String` all "soft line
/// breaks" and replacing all quoted-printable escape sequences with the
/// matching characters as determined by a given encoding.
/// - parameter encoding: A string encoding. The default is UTF-8.
/// - returns: The decoded string, or `nil` for invalid input.
func decodeQuotedPrintable(encoding enc : String.Encoding = .utf8) -> String? {
// Handle soft line breaks, then replace quoted-printable escape sequences.
return self
.replacingOccurrences(of: "=\r\n", with: "")
.replacingOccurrences(of: "=\n", with: "")
.decodeQuotedPrintableSequences(encoding: enc)
}
/// Helper function doing the real work.
/// Decode all "=HH" sequences with respect to the given encoding.
private func decodeQuotedPrintableSequences(encoding enc : String.Encoding) -> String? {
var result = ""
var position = startIndex
// Find the next "=" and copy characters preceding it to the result:
while let range = range(of: "=", range: position..<endIndex) {
result.append(contentsOf: self[position ..< range.lowerBound])
position = range.lowerBound
// Decode one or more successive "=HH" sequences to a byte array:
var bytes = Data()
repeat {
let hexCode = self[position...].dropFirst().prefix(2)
if hexCode.count < 2 {
return nil // Incomplete hex code
}
guard let byte = UInt8(hexCode, radix: 16) else {
return nil // Invalid hex code
}
bytes.append(byte)
position = index(position, offsetBy: 3)
} while position != endIndex && self[position] == "="
// Convert the byte array to a string, and append it to the result:
guard let dec = String(data: bytes, encoding: enc) else {
return nil // Decoded bytes not valid in the given encoding
}
result.append(contentsOf: dec)
}
// Copy remaining characters to the result:
result.append(contentsOf: self[position ..< endIndex])
return result
}
}
用法示例:
if let decoded = "=C2=A31,000".decodeQuotedPrintable() {
print(decoded) // £1,000
}
if let decoded = "=E2=80=9CHello =E2=80=A6 world!=E2=80=9D".decodeQuotedPrintable() {
print(decoded) // “Hello … world!”
}
if let decoded = "Rub=E9n".decodeQuotedPrintable(encoding: .windowsCP1252) {
print(decoded) // Rubén
}
不幸的是,我的回答有点晚了。不过,这可能对其他人有所帮助。
var string = "The cost would be =C2=A31,000"
var finalString: String? = nil
if let regEx = try? NSRegularExpression(pattern: "={1}?([a-f0-9]{2}?)", options: NSRegularExpressionOptions.CaseInsensitive)
{
let intermediatePercentEscapedString = regEx.stringByReplacingMatchesInString(string, options: NSMatchingOptions.WithTransparentBounds, range: NSMakeRange(0, string.characters.count), withTemplate: "%")
print(intermediatePercentEscapedString)
finalString = intermediatePercentEscapedString.stringByRemovingPercentEncoding
print(finalString)
}
您也可以查看这个可行的解决方案 - https://github.com/dunkelstern/QuotedPrintable
let result = QuotedPrintable.decode(string: quoted)