仅使用 iOS API 从 html 中提取 JSON 字符串
Extract JSON string from html only using iOS API
我想使用第三方框架从 html 文档 "without" 中提取 JSON 字符串。
我正在尝试创建 iOS 框架,但我不想在其中使用第三方框架。
示例url:
http://www.nicovideo.jp/watch/sm33786214
在那html里面有一行:
我需要提取:
JSON_String_I_want_to 摘录
并将其转换为 JSON 对象。
用第三方框架"Kanna",是这样的:
if let doc = Kanna.HTML(html: html, encoding: String.Encoding.utf8) {
if let descNode = doc.css("#js-initial-watch-data[data-api-data]").first {
let dataApiData = descNode["data-api-data"]
if let data = dataApiData?.data(using: .utf8) {
if let json = try? JSON(data: data, options: JSONSerialization.ReadingOptions.mutableContainers) {
我在网上搜索了类似的问题,但无法应用于我的案例:(我需要承认我不太了解正则表达式)
if let html = String(data:data, encoding:.utf8) {
let pattern = "data-api-data=\"(.*?)\".*?>"
let regex = try! NSRegularExpression(pattern: pattern, options: .caseInsensitive)
let matches = regex.matches(in: html, options: [], range: NSMakeRange(0, html.count))
var results: [String] = []
matches.forEach { (match) -> () in
results.append( (html as NSString).substring(with: match.rangeAt(1)) )
}
if let stringJSON = results.first {
let d = stringJSON.data(using: String.Encoding.utf8)
if let json = try? JSONSerialization.jsonObject(with: d!, options: []) as? Any {
// it does not get here...
}
有人擅长从 html 中提取并将其转换为 JSON 吗?
谢谢。
你的pattern
好像还不错,只是HTML元素的属性值可能使用了字符实体
在将字符串解析为 JSON 之前,您需要将它们替换为实际字符。
if let html = String(data:data, encoding: .utf8) {
let pattern = "data-api-data=\"([^\"]*)\""
let regex = try! NSRegularExpression(pattern: pattern, options: .caseInsensitive)
let matches = regex.matches(in: html, range: NSRange(0..<html.utf16.count)) //<-USE html.utf16.count, NOT html.count
var results: [String] = []
matches.forEach {match in
let propValue = html[Range(match.range(at: 1), in: html)!]
//### You need to replace character entities into actual characters
.replacingOccurrences(of: """, with: "\"")
.replacingOccurrences(of: "'", with: "'")
.replacingOccurrences(of: ">", with: ">")
.replacingOccurrences(of: "<", with: "<")
.replacingOccurrences(of: "&", with: "&")
results.append(propValue)
}
if let stringJSON = results.first {
let dataJSON = stringJSON.data(using: .utf8)!
do {
let json = try JSONSerialization.jsonObject(with: dataJSON)
print(json)
} catch {
print(error) //You should not ignore errors silently...
}
} else {
print("NO result")
}
}
我想使用第三方框架从 html 文档 "without" 中提取 JSON 字符串。 我正在尝试创建 iOS 框架,但我不想在其中使用第三方框架。
示例url: http://www.nicovideo.jp/watch/sm33786214
在那html里面有一行:
我需要提取: JSON_String_I_want_to 摘录 并将其转换为 JSON 对象。
用第三方框架"Kanna",是这样的:
if let doc = Kanna.HTML(html: html, encoding: String.Encoding.utf8) {
if let descNode = doc.css("#js-initial-watch-data[data-api-data]").first {
let dataApiData = descNode["data-api-data"]
if let data = dataApiData?.data(using: .utf8) {
if let json = try? JSON(data: data, options: JSONSerialization.ReadingOptions.mutableContainers) {
我在网上搜索了类似的问题,但无法应用于我的案例:(我需要承认我不太了解正则表达式)
if let html = String(data:data, encoding:.utf8) {
let pattern = "data-api-data=\"(.*?)\".*?>"
let regex = try! NSRegularExpression(pattern: pattern, options: .caseInsensitive)
let matches = regex.matches(in: html, options: [], range: NSMakeRange(0, html.count))
var results: [String] = []
matches.forEach { (match) -> () in
results.append( (html as NSString).substring(with: match.rangeAt(1)) )
}
if let stringJSON = results.first {
let d = stringJSON.data(using: String.Encoding.utf8)
if let json = try? JSONSerialization.jsonObject(with: d!, options: []) as? Any {
// it does not get here...
}
有人擅长从 html 中提取并将其转换为 JSON 吗?
谢谢。
你的pattern
好像还不错,只是HTML元素的属性值可能使用了字符实体
在将字符串解析为 JSON 之前,您需要将它们替换为实际字符。
if let html = String(data:data, encoding: .utf8) {
let pattern = "data-api-data=\"([^\"]*)\""
let regex = try! NSRegularExpression(pattern: pattern, options: .caseInsensitive)
let matches = regex.matches(in: html, range: NSRange(0..<html.utf16.count)) //<-USE html.utf16.count, NOT html.count
var results: [String] = []
matches.forEach {match in
let propValue = html[Range(match.range(at: 1), in: html)!]
//### You need to replace character entities into actual characters
.replacingOccurrences(of: """, with: "\"")
.replacingOccurrences(of: "'", with: "'")
.replacingOccurrences(of: ">", with: ">")
.replacingOccurrences(of: "<", with: "<")
.replacingOccurrences(of: "&", with: "&")
results.append(propValue)
}
if let stringJSON = results.first {
let dataJSON = stringJSON.data(using: .utf8)!
do {
let json = try JSONSerialization.jsonObject(with: dataJSON)
print(json)
} catch {
print(error) //You should not ignore errors silently...
}
} else {
print("NO result")
}
}