使用 rvest 进行网络抓取 Javascript 重度站点
web scraping Javascript heavy site using rvest
我正在尝试从该网站抓取完整的项目列表和相关详细信息(项目列表在右侧):
https://www.forest-trends.org/project-list/
我似乎无法识别正确的 css 元素来获取项目和相关细节。我想知道这是否与 html 中的 JavaScript 有关?
当我尝试以下操作时:
library(rvest)
link <- "https://www.forest-trends.org/project-list"
urlData <- link %>% read_html %>% html_nodes(".project-tile")
我希望得到一份项目清单。相反,我得到:
{xml_nodeset (0)}
如何return完整的项目列表和相关细节?
有一个 API
可以使用,
library(jsonlite)
df = fromJSON('https://www.forest-trends.org/wp-content/themes/foresttrends/map_tools/project_fetch.php?ids=')
head(df$markers)
lat lng type
1 -11.78449871 -70.73347813 Forest and land-use carbon
2 17.067346 94.459977 Forest and land-use carbon
3 3.054216 -72.333984 Forest and land-use carbon
4 20.98685 -89.03344 Forest and land-use carbon
5 -0.886093 30.5798 Forest and land-use carbon
6 -1.809978 31.131299 Forest and land-use carbon
title location pid size
1 Reforestadores REDD Project Madre de Dios, Peru 1 85000
2 Reforestation and Restoration of degraded mangrove lands, sustainable livelihood and community development in Myanmar Myanmar 2 2575
3 San Nicolas Carbon Sequestration Project San Nicholas, Colombia 3 7300
4 Amigos de Calakmul Mexico Selva Maya, Mexico 4 56700
5 Uganda Nile Basin Reforestation Project No 4 Uganda 5 347
6 Emiti Nibwo Bulora Nyaishozi, Tanzania 6 130
我正在尝试从该网站抓取完整的项目列表和相关详细信息(项目列表在右侧):
https://www.forest-trends.org/project-list/
我似乎无法识别正确的 css 元素来获取项目和相关细节。我想知道这是否与 html 中的 JavaScript 有关?
当我尝试以下操作时:
library(rvest)
link <- "https://www.forest-trends.org/project-list"
urlData <- link %>% read_html %>% html_nodes(".project-tile")
我希望得到一份项目清单。相反,我得到:
{xml_nodeset (0)}
如何return完整的项目列表和相关细节?
有一个 API
可以使用,
library(jsonlite)
df = fromJSON('https://www.forest-trends.org/wp-content/themes/foresttrends/map_tools/project_fetch.php?ids=')
head(df$markers)
lat lng type
1 -11.78449871 -70.73347813 Forest and land-use carbon
2 17.067346 94.459977 Forest and land-use carbon
3 3.054216 -72.333984 Forest and land-use carbon
4 20.98685 -89.03344 Forest and land-use carbon
5 -0.886093 30.5798 Forest and land-use carbon
6 -1.809978 31.131299 Forest and land-use carbon
title location pid size
1 Reforestadores REDD Project Madre de Dios, Peru 1 85000
2 Reforestation and Restoration of degraded mangrove lands, sustainable livelihood and community development in Myanmar Myanmar 2 2575
3 San Nicolas Carbon Sequestration Project San Nicholas, Colombia 3 7300
4 Amigos de Calakmul Mexico Selva Maya, Mexico 4 56700
5 Uganda Nile Basin Reforestation Project No 4 Uganda 5 347
6 Emiti Nibwo Bulora Nyaishozi, Tanzania 6 130