R 中的网络抓取 - 如何从页面而不是第一个产品收集所有产品的信息？

Question

嗨 - 我已经开始使用 R 学习网络抓取。我的第一个项目是从 indigo 收集所有烹饪书籍的列表并进行一些分析。

但目前，我只能 select 页面中的第一本书。我使用“rvest”包和 google chrome select 或 Gadget。我看过 YouTube 视频和链接，但似乎没有人遇到这个问题，很高兴获得关于从该页面和所有可用页面列出所有书籍的任何想法。

代码：

library(rvest) library(tidyverse)

indigo_page = read_html("https://www.chapters.indigo.ca/en-ca/books/top-tens/cookbooks/")

indigo_page%>% html_node(".product-list__product-title")%>% html_text()

输出：

[1] "The Comfortable Kitchen: 105 Laid-back, Healthy, And Wholesome Recipes"

Answer 1

Donjazz，我想第一个建议是使用 html_nodes()，而不是 html_node()。这个小改动似乎可以为您输出所有标题。

R 中的网络抓取 - 如何从页面而不是第一个产品收集所有产品的信息？

Web-scraping in R - How can I collect information of all products from a page instead of the first product only?

web-scraping

rvest