无法抓取中央银行 BR 的网站（在 R 中）

Question

我已经检查了巴西中央银行的版权，从现在开始：“BR Central Bank”，(link here)和：

The total or partial reproduction of the content of this site is allowed, preserving the integrity of the information and citing the source. It is also authorized to insert links on other websites to the Central Bank of Brazil (BCB) website. However, the BCB reserves the right to change the provision of information on the site as necessary without notice.

因此，我正在尝试抓取这个网站：https://www.bcb.gov.br/estabilidadefinanceira/leiautedoc2061e2071/atuais，但我不明白为什么我做不到。下面你会发现我在做什么。保存时html为空。我究竟做错了什么？有人可以帮我吗？在这一步之后，我将阅读 html 代码并从上一个数据库中查找新添加的内容。

url_bacen <- "https://www.bcb.gov.br/estabilidadefinanceira/leiautedoc2061e2071/atuais"
file_bacen_2061 <- paste("Y:/Dir_Path/" , "BACEN_2061.html", sep="" )
download.file(url_bacen,file_bacen_2061, method="auto",quiet= FALSE, mode="wb")

感谢您的帮助，

费利佩

Answer 1

数据是从 API 调用中动态提取的，您可以在按 F5 刷新页面时找到它的网络选项卡，即登录页面对您未捕获的信息发出额外的 xhr 请求。如果你模仿这个请求 returns json 你可以解析任何你想要的信息

library(jsonlite)

data <- jsonlite::read_json('https://www.bcb.gov.br/api/servico/sitebcb/leiautes2061')

print(data$conteudo)

无法抓取中央银行 BR 的网站（在 R 中）

Can't scrape site of Central Bank BR (in R)

finance

r

download

web-scraping