使用 rvest 抓取链接 (href) 时接收 NA

Receiving NAs when scraping links (href) with rvest

我见过一些类似的问题,但是 none 的解决方案对我有用。我试图获取每个节点的链接的 url,但列表只是空值。

beer <- read_html("https://www.beeradvocate.com/lists/top/")

beerLink <- beer %>% 
html_nodes(".hr_bottom_light a b") %>% 
html_attr('href') %>% 
as.data.frame() 

如有任何帮助,我们将不胜感激。

b是后代节点,但是a包含了你想要的链接。您可以四处搜索一些后代模式(我只熟悉 xpath 版本,而且您似乎更喜欢 CSS),但是这种替代方法可以在没有它的情况下获得您想要的链接:

#using a stub to facilitate accessing the URLs later with
#  an absolute address
stub = 'https://www.beeradvocate.com'
beer <- read_html(paste0(stub, '/lists/top/'))
lnx = beer %>% html_nodes('a') %>% html_attr('href') %>%
  #this pattern matches beer profile links --
  #  the first . is a brewery ID, the second .
  #  is a beer ID within that brewery
  grep('profile/.*/.*/', ., value = TRUE) %>%
  paste0(stub, .)
head(lnx)
# [1] "https://www.beeradvocate.com/beer/profile/23222/78820/"  
# [2] "https://www.beeradvocate.com/beer/profile/28743/136936/"
# [3] "https://www.beeradvocate.com/beer/profile/28743/146770/" 
# [4] "https://www.beeradvocate.com/beer/profile/28743/87846/" 
# [5] "https://www.beeradvocate.com/beer/profile/863/21690/"    
# [6] "https://www.beeradvocate.com/beer/profile/17981/110635/"

此外,Abraxas 是一张很棒的啤酒和桑塔纳专辑