使用 rvest 抓取链接 (href) 时接收 NA
Receiving NAs when scraping links (href) with rvest
我见过一些类似的问题,但是 none 的解决方案对我有用。我试图获取每个节点的链接的 url,但列表只是空值。
beer <- read_html("https://www.beeradvocate.com/lists/top/")
beerLink <- beer %>%
html_nodes(".hr_bottom_light a b") %>%
html_attr('href') %>%
as.data.frame()
如有任何帮助,我们将不胜感激。
b
是后代节点,但是a
包含了你想要的链接。您可以四处搜索一些后代模式(我只熟悉 xpath
版本,而且您似乎更喜欢 CSS),但是这种替代方法可以在没有它的情况下获得您想要的链接:
#using a stub to facilitate accessing the URLs later with
# an absolute address
stub = 'https://www.beeradvocate.com'
beer <- read_html(paste0(stub, '/lists/top/'))
lnx = beer %>% html_nodes('a') %>% html_attr('href') %>%
#this pattern matches beer profile links --
# the first . is a brewery ID, the second .
# is a beer ID within that brewery
grep('profile/.*/.*/', ., value = TRUE) %>%
paste0(stub, .)
head(lnx)
# [1] "https://www.beeradvocate.com/beer/profile/23222/78820/"
# [2] "https://www.beeradvocate.com/beer/profile/28743/136936/"
# [3] "https://www.beeradvocate.com/beer/profile/28743/146770/"
# [4] "https://www.beeradvocate.com/beer/profile/28743/87846/"
# [5] "https://www.beeradvocate.com/beer/profile/863/21690/"
# [6] "https://www.beeradvocate.com/beer/profile/17981/110635/"
此外,Abraxas 是一张很棒的啤酒和桑塔纳专辑
我见过一些类似的问题,但是 none 的解决方案对我有用。我试图获取每个节点的链接的 url,但列表只是空值。
beer <- read_html("https://www.beeradvocate.com/lists/top/")
beerLink <- beer %>%
html_nodes(".hr_bottom_light a b") %>%
html_attr('href') %>%
as.data.frame()
如有任何帮助,我们将不胜感激。
b
是后代节点,但是a
包含了你想要的链接。您可以四处搜索一些后代模式(我只熟悉 xpath
版本,而且您似乎更喜欢 CSS),但是这种替代方法可以在没有它的情况下获得您想要的链接:
#using a stub to facilitate accessing the URLs later with
# an absolute address
stub = 'https://www.beeradvocate.com'
beer <- read_html(paste0(stub, '/lists/top/'))
lnx = beer %>% html_nodes('a') %>% html_attr('href') %>%
#this pattern matches beer profile links --
# the first . is a brewery ID, the second .
# is a beer ID within that brewery
grep('profile/.*/.*/', ., value = TRUE) %>%
paste0(stub, .)
head(lnx)
# [1] "https://www.beeradvocate.com/beer/profile/23222/78820/"
# [2] "https://www.beeradvocate.com/beer/profile/28743/136936/"
# [3] "https://www.beeradvocate.com/beer/profile/28743/146770/"
# [4] "https://www.beeradvocate.com/beer/profile/28743/87846/"
# [5] "https://www.beeradvocate.com/beer/profile/863/21690/"
# [6] "https://www.beeradvocate.com/beer/profile/17981/110635/"
此外,Abraxas 是一张很棒的啤酒和桑塔纳专辑