R 软件 - rvest 包，"download number" 中的错误

Question

我想下载亚马逊图书评论数，但我有一个问题

我尝试了以下方法：

library(rvest)
url<-paste0("http://www.amazon.com/s/ref=lp_4_nr_p_72_3?",
            "fst=as%3Aoff&rh=n%3A283155%2Cn%3A%211000%2C",
            "n%3A4%2Cp_72%3A1250224011&bbn=4&ie=UTF8&qid",
            "=1440446201&rnid=1250219011")
html<-html(url)


Reviews <- try({html_nodes(html, "#s-results-list-atf .a-text-normal:nth-child(2)") %>%
    html_text()}, silent = TRUE)

但我的 R 控制台中只有 4 个评论计数，而不是 12 个（使用选择器小工具）。我做错了什么？

当我尝试下载书名时，我没有遇到同样的问题...只是在评论数方面。

Book <- try({ html_nodes(html, ".s-access-title") %>%
                        html_text()}, silent = TRUE)

第link页Amazon Page

Answer 1

这可能不是规范的方法，但这是我所做的有效方法：

#via Inspect element in Chrome, the relevant info is
#  in an <a> tag with class 'a-size-small a-link-normal a-text-normal'
#  but this does not uniquely identify the review counts
#  (e.g., the .00 Buy used & new... bit is also there)
#  so we take a step up and find that both the rating
#  and the review count are stored in a <div> tag
#  with class 'a-row a-spacing-mini'
x<-html(url) %>% html_nodes("div.a-row.a-spacing-mini") %>%
  html_nodes("a.a-size-small.a-link-normal.a-text-normal") %>%
  html_text
#upon inspection of x, we can see that the relevant numbers
#  always appear by themselves, thus:
> x[!is.na(as.integer(gsub(",","",x)))]

 [1] "168"   "232"   "1,607" "2,226" "1,060" "25"    "731"   "2,374" "345"   "7,205"
[11] "1,134" "1,137"

R 软件 - rvest 包，"download number" 中的错误

R software - rvest package, error in "download number"

r

rvest