如何用 colly 抓取属性中的属性
how to scrape attribute in attibute with colly
我尝试抓取产品的 productId,但我做不到。请帮忙
html代码
<span class="info">
<button data-product="{"merchantName":"xxx","price":"11","productName":"car window","categoryName":"windows","brandName":"aa assosiations","productId":"which I want to scrape"}">
当我尝试时
h.ChildAttr("span.info>button", "data-product")
结果是 {"merchantName":"xxx","price":"11","productName":"car window","categoryName":"windows","brandName":"aa assosiations","productId":"which I want to scrape"}
当我尝试时
h.ChildAttr("span.info>button", "productId")
没有结果。
我如何使用 colly 获取这些数据?
属性值是一个原始值,在本例中,它是 JSON 格式,因此您需要解析 JSON 才能正确获取数据。
例如:
package main
import (
"log"
"encoding/json"
"github.com/gocolly/colly"
)
func main() {
c := colly.NewCollector()
c.OnHTML(`body`, func(e *colly.HTMLElement) {
text := e.ChildAttr("span.info>button", "data-product")
var result map[string]interface{}
err := json.Unmarshal([]byte(text), &result)
if err != nil {
log.Println(err)
return
}
log.Println(result["productId"])
})
c.Visit("[some url]")
}
输出
2021/10/21 14:23:24 which I want to scrape
我尝试抓取产品的 productId,但我做不到。请帮忙
html代码
<span class="info">
<button data-product="{"merchantName":"xxx","price":"11","productName":"car window","categoryName":"windows","brandName":"aa assosiations","productId":"which I want to scrape"}">
当我尝试时
h.ChildAttr("span.info>button", "data-product")
结果是 {"merchantName":"xxx","price":"11","productName":"car window","categoryName":"windows","brandName":"aa assosiations","productId":"which I want to scrape"}
当我尝试时
h.ChildAttr("span.info>button", "productId")
没有结果。 我如何使用 colly 获取这些数据?
属性值是一个原始值,在本例中,它是 JSON 格式,因此您需要解析 JSON 才能正确获取数据。
例如:
package main
import (
"log"
"encoding/json"
"github.com/gocolly/colly"
)
func main() {
c := colly.NewCollector()
c.OnHTML(`body`, func(e *colly.HTMLElement) {
text := e.ChildAttr("span.info>button", "data-product")
var result map[string]interface{}
err := json.Unmarshal([]byte(text), &result)
if err != nil {
log.Println(err)
return
}
log.Println(result["productId"])
})
c.Visit("[some url]")
}
输出
2021/10/21 14:23:24 which I want to scrape