如何使用 rvest 抓取弹出文本?

How to scrape pop-up text using rvest?

我想从以下网站抓取信息:https://www.theglobaleconomy.com/download-data.php

正如您将看到的,每个经济变量都有相关的信息框,例如图中的那个,当您点击 i 时会弹出: https://i.stack.imgur.com/E3JRy.png

SelectorGadget 和代码检查表明我应该使用“#definitionBoxText”作为 CSS 选择器,但是当我 运行 nodes <- read_html("https://www.theglobaleconomy.com/download-data.php") %>% html_nodes("#definitionBoxText") %>% html_text() 时它不起作用,我什么也没得到在 return 中,只是空白。您能否指导我如何获取这些信息?非常感谢任何帮助!

看起来 #definitionBoxText 的值是在您通过 PHP 脚本单击信息图标时生成的。这意味着您将无法抓取该文本,除非您使用 RSelenium 之类的东西并模拟点击每个图标。

另一种方法是按 F12 打开开发人员工具,转到“源”选项卡并保存名为 download-data.php 的文件,其中包含您要查找的所有定义。然后您可以单独抓取该文件。下面附上可抓取部分的样子:

<div class="indicatorsName">
    Economic growth: the rate of change of real GDP
</div>

<div class="infoIcon">
    <div class="showDefinition"
        style="margin: 4px 3px 0; padding: 1px 6px 0;  border-radius: 10px; border: 1px solid #333; color: #333; float: right; font-weight: bold; font-size;10px">
        i
    </div>
</div>

<div class="clearer"></div>

<div class="definition">
    <b>Economic growth: the rate of change of real GDP</b><br /><br />
    Definition:
    Annual percentage growth rate of GDP at market prices based on constant local currency. Aggregates are based on
    constant 2010 U.S. dollars. GDP is the sum of gross value added by all resident producers in the economy plus any
    product taxes and minus any subsidies not included in the value of the products. It is calculated without making
    deductions for depreciation of fabricated assets or for depletion and degradation of natural resources.
</div>
</div>