使用 rvest 进行网络抓取 Javascript 重度站点

Question

我正在尝试从该网站抓取完整的项目列表和相关详细信息（项目列表在右侧）：

https://www.forest-trends.org/project-list/

我似乎无法识别正确的 css 元素来获取项目和相关细节。我想知道这是否与 html 中的 JavaScript 有关？

当我尝试以下操作时：

library(rvest)
link <- "https://www.forest-trends.org/project-list"
urlData <- link %>% read_html %>% html_nodes(".project-tile")

我希望得到一份项目清单。相反，我得到：

{xml_nodeset (0)}

如何return完整的项目列表和相关细节？

Answer 1

有一个 API 可以使用，

library(jsonlite)
df = fromJSON('https://www.forest-trends.org/wp-content/themes/foresttrends/map_tools/project_fetch.php?ids=')
head(df$markers)
           lat          lng                       type
1 -11.78449871 -70.73347813 Forest and land-use carbon
2    17.067346    94.459977 Forest and land-use carbon
3     3.054216   -72.333984 Forest and land-use carbon
4     20.98685    -89.03344 Forest and land-use carbon
5    -0.886093      30.5798 Forest and land-use carbon
6    -1.809978    31.131299 Forest and land-use carbon
                                                                                                                  title               location pid  size
1                                                                                           Reforestadores REDD Project    Madre de Dios, Peru   1 85000
2 Reforestation and Restoration of degraded mangrove lands, sustainable livelihood and community development in Myanmar                Myanmar   2  2575
3                                                                              San Nicolas Carbon Sequestration Project San Nicholas, Colombia   3  7300
4                                                                                             Amigos de Calakmul Mexico     Selva Maya, Mexico   4 56700
5                                                                          Uganda Nile Basin Reforestation Project No 4                 Uganda   5   347
6                                                                                                    Emiti Nibwo Bulora    Nyaishozi, Tanzania   6   130

使用 rvest 进行网络抓取 Javascript 重度站点

web scraping Javascript heavy site using rvest

r

rvest