使用 JSOUP 查找 HTML 类 的正则表达式
Regex for finding HTML classes with JSOUP
对于我的项目,我需要解析 HTML
并获取产品的价格。这就是我目前的做法:
let url = "https://www.adidas.de/adistar-trikot/CV7089.html"
let className = "gl-price__value"
do {
let html: String = getHTMLfromURL(url: url)
let doc: Document = try SwiftSoup.parse(html)
let price: Elements = try doc.getElementsByClass(className)
let priceText : String = try price.text()
result.text = priceText
} catch Exception.Error(let type, let message) {
print(message)
} catch {
print("error")
}
问题:
如何将 className
更改为 regex
,以便下面的所有 3 个示例都匹配?我现在尝试了几种可能性,但无法使其发挥作用。乐于助人!
示例 1:
<div class="price">82 EUR</div>
示例 2:
<span class="gl-price__value">€ 139,95</span>
示例 3:
<span id="priceblock_ourprice" class="a-size-medium a-color-price priceBlockBuyingPriceString">79,99 €</span>
也许 getElementsByClass
不是最好的方法。来自 SwiftSoup Readme - Use selector syntax to find elements
SwiftSoup elements support a CSS (or jQuery) like selector syntax to find matching elements, that allows very powerful and robust queries.
[attr~=regex]
: elements with attribute values that match the regular expression; e.g. img[src~=(?i)\.(png|jpe?g)]
您的代码将变成类似于:
let doc: Document = try SwiftSoup.parse(html)
let priceClasses: Elements = try doc.select("[class~=(?i)price]")
for priceClass: Element in priceClasses.array() {
let priceText : String = try priceClass.text()
...
}
...
我在此处使用 price
作为基于您提供的示例的正则表达式,但您可以根据需要进行调整。
对于我的项目,我需要解析 HTML
并获取产品的价格。这就是我目前的做法:
let url = "https://www.adidas.de/adistar-trikot/CV7089.html"
let className = "gl-price__value"
do {
let html: String = getHTMLfromURL(url: url)
let doc: Document = try SwiftSoup.parse(html)
let price: Elements = try doc.getElementsByClass(className)
let priceText : String = try price.text()
result.text = priceText
} catch Exception.Error(let type, let message) {
print(message)
} catch {
print("error")
}
问题:
如何将 className
更改为 regex
,以便下面的所有 3 个示例都匹配?我现在尝试了几种可能性,但无法使其发挥作用。乐于助人!
示例 1:
<div class="price">82 EUR</div>
示例 2:
<span class="gl-price__value">€ 139,95</span>
示例 3:
<span id="priceblock_ourprice" class="a-size-medium a-color-price priceBlockBuyingPriceString">79,99 €</span>
也许 getElementsByClass
不是最好的方法。来自 SwiftSoup Readme - Use selector syntax to find elements
SwiftSoup elements support a CSS (or jQuery) like selector syntax to find matching elements, that allows very powerful and robust queries.
[attr~=regex]
: elements with attribute values that match the regular expression; e.g.img[src~=(?i)\.(png|jpe?g)]
您的代码将变成类似于:
let doc: Document = try SwiftSoup.parse(html)
let priceClasses: Elements = try doc.select("[class~=(?i)price]")
for priceClass: Element in priceClasses.array() {
let priceText : String = try priceClass.text()
...
}
...
我在此处使用 price
作为基于您提供的示例的正则表达式,但您可以根据需要进行调整。