如何从网站获取 table(scrappin)

Question

我想把这个网站的 table 放到 Rstudio 中： “https://www.worldometers.info/coronavirus/#countries”

我从零开始学习 R 一个月了，这就是我所做的：

library(XML)     
library(rvest)
library(xml2)

url<-("https://www.worldometers.info/coronavirus/#countries")

covid<-readHTMLTable(url,which=1)

head(covid)

输出错误信息

url<-("https://www.worldometers.info/coronavirus/#countries")
> covid<-readHTMLTable(url,which=1)
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message:
XML content does not seem to be XML: ''

我需要帮助

Answer 1

我们可以使用 rvest 来获取数据。

library(rvest)
url <- "https://www.worldometers.info/coronavirus/#countries"

url %>% 
  read_html() %>%
  html_table() %>%
  .[[1]] %>%
  replace(., . == '', NA)


#  Country,Other TotalCases NewCases TotalDeaths NewDeaths TotalRecovered ActiveCases Serious,Critical Tot Cases/1M pop
#1         China     80,894      +13       3,237        11         69,614       8,043            2,622               56
#2         Italy     31,506     <NA>       2,503        NA          2,941      26,062            2,060              521
#3          Iran     16,169     <NA>         988        NA          5,389       9,792             <NA>              193
#4         Spain     11,826     <NA>         533        NA          1,028      10,265              563              253
#5       Germany      9,414      +47          26        NA             71       9,317                2              112
#6      S. Korea      8,413      +93          84         3          1,540       6,789               59              164
#...

您可以查看 readr::parse_number 以将 TotalCases、NewCases 等列转换为数字格式。

如何从网站获取 table(scrappin)

How to get table(scrappin) from a website

xml

r

rvest

xml2