return 使用 rvest 来自 html 的相同数量的元素

return same number of elements from html using rvest

我正在尝试使用 rvest

抓取英国所有 Apple 商店的城市名称和地址
library(rvest)
library(xml2)
library(tidyverse)

my_url <- read_html("https://www.apple.com/uk/retail/storelist/")

# extract city name 
city_name <- my_url %>% html_elements("h2") %>% html_text2()
length(city_name)
# 27 cities

address <- my_url %>% html_elements("address") %>% html_text2()
length(address)
# 38 addresses

我收到的地址多于城市名称。这是因为一些城市已经 多家商店。我如何获得相同号码的城市名称和地址,以便我可以 将它们放入数据框中?

你可以做到

library(rvest)
library(xml2)
library(tidyverse)

read_html("https://www.apple.com/uk/retail/storelist/") %>% 
  html_elements(xpath = "//div[@class='state']") %>%
  lapply(function(x) {
    data.frame(city = html_element(x, "h2") %>% html_text(), 
               address = html_elements(x, "address") %>% html_text2())}) %>%
  do.call(rbind, .) %>%
  as_tibble()
#> # A tibble: 38 x 2
#>    city            address                                                      
#>    <chr>           <chr>                                                        
#>  1 Aberdeen        "27/28 Ground Level Mall\nUnion Square\nAberdeen , AB11 ~
#>  2 Antrim          "Upper Ground Floor\n1 Victoria Square\nBelfast , BT1 4Q~
#>  3 Berkshire       "The Oracle Shopping Centre\nUpper Level\nReading , RG1 ~
#>  4 Bristol         "11 Philadelphia Street\nQuakers Friars\nBristol , BS1 3~
#>  5 Bristol         "Upper Mall\nThe Mall at Cribbs Causeway\nBristol , BS34~
#>  6 Buckinghamshire "26 Midsummer Place\nMidsummer Boulevard\nMilton Keynes ~
#>  7 Cambridgeshire  "Grand Arcade Shopping Centre\nCambridge , CB2 3AX\n0122~
#>  8 Cardiff         "63-66 Grand Arcade\nSt David’s Dewi Sant\nCardiff , CF1~
#>  9 Central London  "No. 1-7 The Piazza\nLondon , WC2E 8HB\n020 7447 1400"    
#> 10 Central London  "235 Regent Street\nLondon , W1B 2EL\n020 7153 9000"      
#> # ... with 28 more rows

reprex package (v2.0.1)

于 2022-04-12 创建