使用 R 模拟点击 'Download dataset' 并将文件保存在不同的文件夹中

Question

我希望有人能帮我弄清楚如何抓取没有 link.

的 .csv 文件

单击 R 中的下载按钮

我想让 R 下载单击本网站 https://www.opentable.com/state-of-industry. The closest post I found to my problem is 上第一个 table 旁边的 'Download dataset' 时生成的 .csv 文件，但我找不到API link 是在解决方案中使用的。

可能的第二个问题：将下载的文件保存到另一个位置

理想情况下，我希望文件在 R 中加载（类似于上面 link 中的解决方案），但如果唯一的方法是将其下载到我的设备上然后读取它在 R 中，然后我希望将 .csv 文件安装在特定文件夹中（例如 C:\Documents\OpenTable）并覆盖具有相同名称的现有文件。

谢谢！

Answer 1

那是因为这个页面没有调用任何API，CSV文件中的所有数据都在页面的JS代码中。您会在包含 covidDataCenter 的 <script> 标签中找到它。要将JS中创建的数据转换为R中的数据，需要V8包。然后，对数据做一些转换：

library(rvest)
library(V8)
library(dplyr)
library(tidyr)
pg <- read_html("https://www.opentable.com/state-of-industry")
js <- pg %>% html_node(xpath = "//script[contains(., 'covidDataCenter')]") %>% html_text()
ct <- V8::new_context()
ct$eval("var window = {}") # the JS code creates a `window` object that we need to initialize first
ct$eval(js)
data <- ct$get("window")$`__INITIAL_STATE__`$covidDataCenter$fullbook # this is where the data sets get values
dates <- data$headers
countries <- data$countries 
states <- data$states
cities <- data$cities
# ALthough it's not straight-forward but you can achieve the datasets you want by this:
countries_df <- countries %>%
  unnest(yoy) %>%
  group_by(name, id, size) %>%
  mutate(
    date = dates
  ) %>%
  ungroup() %>%
  spread(date, yoy) %>%
  .[c("name", "id", "size", dates)] # arrange the columns
# similar to states and cities

通过write.csv()导出数据框到CSV文件。

使用 R 模拟点击 'Download dataset' 并将文件保存在不同的文件夹中

Use R to mimic clicking 'Download dataset' and save file in a different folder

post

r

web-scraping

httr