如何使用 R 从框架内的网站抓取数据？

Question

以下link包含巴黎马拉松的成绩：http://www.schneiderelectricparismarathon.com/us/the-race/results/results-marathon。我想抓取这些结果，但信息位于一个框架内。我知道使用 Rvest 和 Rselenium 进行抓取的基础知识，但我对如何在这样的框架内检索数据一无所知。为了得到一个想法，我尝试过的事情之一是：

url = "http://www.schneiderelectricparismarathon.com/us/the-race/results/results-marathon"
site = read_html(url)
ParisResults = site %>% html_node("iframe") %>% html_table()
ParisResults = as.data.frame(ParisResults)

非常欢迎任何解决此问题的帮助！

Answer 1

结果由 ajax 从以下 url 加载：

url="http://www.aso.fr/massevents/resultats/ajax.php?v=1460995792&course=mar16&langue=us&version=3&action=search"
  table <- url %>%
    read_html(encoding="UTF-8") %>%
    html_nodes(xpath='//table[@class="footable"]') %>%
    html_table()

PS：我不知道ajax到底是什么，我只知道基本的rvest

编辑：为了回答评论中的问题：我在网络抓取方面没有太多经验。如果您只使用 rvest 或 xml 的非常基本的技术，您必须对网站多了解一点，每个网站都有自己的结构。对于这一个，我是这样做的：

如您所见，在源代码中您看不到任何结果，因为它们位于 iframe 中，并且在检查代码时，您可以在 "RESULTS OF 2016 EDITION" 之后看到：

class="iframe-xdm iframe-resultats" data-href="http://www.aso.fr/massevents/resultats/index.php?langue=us&course=mar16&version=3"
现在你可以直接使用这个 url : http://www.aso.fr/massevents/resultats/index.php?langue=us&course=mar16&version=2
但是你仍然可以得到结果。然后，您可以使用 Chrome 开发人员工具 > 网络 > XHR。刷新页面时，可以看到从这个url加载数据（选择Men类目时）：http://www.aso.fr/massevents/resultats/ajax.php?course=mar16&langue=us&version=2&action=search&fields%5Bsex%5D=F&limiter=&order=
现在可以得到结果了！
而如果你想要第二页等等，你可以点击页码，然后使用开发者工具看看会发生什么！

如何使用 R 从框架内的网站抓取数据？

How can I scrape data from a website within a frame using R?

r

web-scraping

rselenium

rvest