rvest html_nodes returns {xml_nodeset (0)}
rvest html_nodes returns {xml_nodeset (0)}
我一直在尝试使用 rvest 和 selectorGadge 抓取 this page。我能够抓取产品描述,但是当我尝试获取如图所示的值时:
但是,当我 运行 代码时:
library(dplyr)
library(rvest)
read_html("https://www.dicasanet.com.br/material-de-construcao") %>%
html_nodes(".product-payment")
我不断得到结果“{xml_nodeset (0)}”。我注意到,与其他值(如产品名称)不同,这不是 div.a,而是 div.div。还有其他方法可以获取这些值吗?我该如何进行?提前致谢!
我无法使用 Rvest 抓取价格,但是使用 RSelenium 可以做到:
library(RSelenium)
# I use RSelenium in combination with docker.
remDr <- remoteDriver(
remoteServerAddr = "localhost",
port = 4445L,
browserName = "chrome"
)
remDr$open()
remDr$navigate("https://www.dicasanet.com.br/material-de-construcao")
page <- read_html(remDr$getPageSource()[[1]])
page %>% html_nodes("product-price") %>% html_text()
price <- remDr$findElements(using = "class","product-price")
price <- sapply(price,function(x){x$getElementText()[[1]]})
price
输出是:
price
[1] "" "" "" ""
[5] "" "" "R$ 40,50" "R$ 40,50"
[9] "R$ 194,90" "R$ 194,90" "R$ 171,00\nR$ 122,90" "R$ 171,00\nR$ 122,90"
[13] "R$ 393,00" "R$ 393,00" "R$ 357,50" "R$ 357,50"
[17] "R$ 433,20" "R$ 433,20" "R$ 120,60" "R$ 120,60"
[21] "R$ 89,50" "R$ 89,50" "R$ 56,20" "R$ 56,20"
数据是从 script
标签内的 JavaScript 对象动态加载的。您可以从响应文本中将其正则表达式,用 jsonlite
解析为 json 对象,然后提取您想要的产品
library(magrittr)
library(rvest)
library(stringr)
library(jsonlite)
page <- read_html('https://www.dicasanet.com.br/loja/catalogo.php?loja=790930&categoria=1')
data <- page %>%
toString() %>%
stringr::str_match('dataLayer = (\[.*\])') %>%
.[2] %>%
jsonlite::parse_json()
print(data[[1]]$listProducts)
我一直在尝试使用 rvest 和 selectorGadge 抓取 this page。我能够抓取产品描述,但是当我尝试获取如图所示的值时:
但是,当我 运行 代码时:
library(dplyr)
library(rvest)
read_html("https://www.dicasanet.com.br/material-de-construcao") %>%
html_nodes(".product-payment")
我不断得到结果“{xml_nodeset (0)}”。我注意到,与其他值(如产品名称)不同,这不是 div.a,而是 div.div。还有其他方法可以获取这些值吗?我该如何进行?提前致谢!
我无法使用 Rvest 抓取价格,但是使用 RSelenium 可以做到:
library(RSelenium)
# I use RSelenium in combination with docker.
remDr <- remoteDriver(
remoteServerAddr = "localhost",
port = 4445L,
browserName = "chrome"
)
remDr$open()
remDr$navigate("https://www.dicasanet.com.br/material-de-construcao")
page <- read_html(remDr$getPageSource()[[1]])
page %>% html_nodes("product-price") %>% html_text()
price <- remDr$findElements(using = "class","product-price")
price <- sapply(price,function(x){x$getElementText()[[1]]})
price
输出是:
price
[1] "" "" "" ""
[5] "" "" "R$ 40,50" "R$ 40,50"
[9] "R$ 194,90" "R$ 194,90" "R$ 171,00\nR$ 122,90" "R$ 171,00\nR$ 122,90"
[13] "R$ 393,00" "R$ 393,00" "R$ 357,50" "R$ 357,50"
[17] "R$ 433,20" "R$ 433,20" "R$ 120,60" "R$ 120,60"
[21] "R$ 89,50" "R$ 89,50" "R$ 56,20" "R$ 56,20"
数据是从 script
标签内的 JavaScript 对象动态加载的。您可以从响应文本中将其正则表达式,用 jsonlite
解析为 json 对象,然后提取您想要的产品
library(magrittr)
library(rvest)
library(stringr)
library(jsonlite)
page <- read_html('https://www.dicasanet.com.br/loja/catalogo.php?loja=790930&categoria=1')
data <- page %>%
toString() %>%
stringr::str_match('dataLayer = (\[.*\])') %>%
.[2] %>%
jsonlite::parse_json()
print(data[[1]]$listProducts)