从网站获取相关标签以抓取地址

Question

我正在尝试使用下面的 link 抓取沃尔玛在密苏里州的位置：

https://www.walmart.com/store/finder?location=Missouri&distance=50

library(rvest)
library(xml2)
library(tidyverse)

url <- read_html("https://www.walmart.com/store/finder?location=Missouri&distance=50")

我使用 SelectorGadget 检查 NearbyStores 中的内容并使用它来提取商店地址。

尝试先提取城市，但我一无所获

url %>% html_elements(".city")
{xml_nodeset (0)}

然后我尝试提取地址和商店类型，但仍然一无所获。

url %>% html_elements(".result-element-address")
{xml_nodeset (0)}
  
url %>% html_elements(".result-element-store-type")
{xml_nodeset (0)}

我正在尝试创建一个包含城市名称和地址的数据框

Answer 1

您要查找的标签在您请求的文档中不存在。它是在页面加载后由 javascript 代码动态构建的。幸运的是，页面上确实存在实际数据，以脚本标记之一内的 json 字符串的形式存在。这需要一些解析，但包含您需要的所有信息：

library(rvest)
library(xml2)
library(tidyverse)

url <- read_html("https://www.walmart.com/store/finder?location=Missouri&distance=50")
stores <- html_element(url, xpath = "//script[@id='storeFinder']") %>% 
  html_text() %>%
  jsonlite::parse_json()
  
do.call(rbind, lapply(stores$storeFinder$storeFinderCarousel$stores, 
       function(x) as.data.frame(x$address)))
#>    postalCode                 address           city state country
#> 1       65401        500 S Bishop Ave          Rolla    MO      US
#> 2       65584   185 Saint Robert Blvd   Saint Robert    MO      US
#> 3       65453            100 Ozark Dr           Cuba    MO      US
#> 4       65560       1101 W Highway 32          Salem    MO      US
#> 5       65066         1888 Highway 28     Owensville    MO      US
#> 6       63080       350 Park Ridge Rd       Sullivan    MO      US
#> 7       65101      401 Supercenter Dr Jefferson City    MO      US
#> 8       65065         4252 Highway 54    Osage Beach    MO      US
#> 9       65483 1433 S Sam Houston Blvd        Houston    MO      US
#> 10      65109   724 Stadium West Blvd Jefferson City    MO      US
#> 11      65026      1802 S Business 54          Eldon    MO      US
#> 12      65020             94 Cecil St      Camdenton    MO      US
#> 13      65536    1800 S Jefferson Ave        Lebanon    MO      US

从网站获取相关标签以抓取地址

getting the relevant tag for scraping address from website

html

css

r

web-scraping

rvest