如何从网站获取 table(scrappin)
How to get table(scrappin) from a website
我想把这个网站的 table 放到 Rstudio 中:
“https://www.worldometers.info/coronavirus/#countries”
我从零开始学习 R 一个月了,这就是我所做的:
library(XML)
library(rvest)
library(xml2)
url<-("https://www.worldometers.info/coronavirus/#countries")
covid<-readHTMLTable(url,which=1)
head(covid)
输出错误信息
url<-("https://www.worldometers.info/coronavirus/#countries")
> covid<-readHTMLTable(url,which=1)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message:
XML content does not seem to be XML: ''
我需要帮助
我们可以使用 rvest
来获取数据。
library(rvest)
url <- "https://www.worldometers.info/coronavirus/#countries"
url %>%
read_html() %>%
html_table() %>%
.[[1]] %>%
replace(., . == '', NA)
# Country,Other TotalCases NewCases TotalDeaths NewDeaths TotalRecovered ActiveCases Serious,Critical Tot Cases/1M pop
#1 China 80,894 +13 3,237 11 69,614 8,043 2,622 56
#2 Italy 31,506 <NA> 2,503 NA 2,941 26,062 2,060 521
#3 Iran 16,169 <NA> 988 NA 5,389 9,792 <NA> 193
#4 Spain 11,826 <NA> 533 NA 1,028 10,265 563 253
#5 Germany 9,414 +47 26 NA 71 9,317 2 112
#6 S. Korea 8,413 +93 84 3 1,540 6,789 59 164
#...
您可以查看 readr::parse_number
以将 TotalCases
、NewCases
等列转换为数字格式。
我想把这个网站的 table 放到 Rstudio 中: “https://www.worldometers.info/coronavirus/#countries”
我从零开始学习 R 一个月了,这就是我所做的:
library(XML)
library(rvest)
library(xml2)
url<-("https://www.worldometers.info/coronavirus/#countries")
covid<-readHTMLTable(url,which=1)
head(covid)
输出错误信息
url<-("https://www.worldometers.info/coronavirus/#countries")
> covid<-readHTMLTable(url,which=1)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘readHTMLTable’ for signature ‘"NULL"’
In addition: Warning message:
XML content does not seem to be XML: ''
我需要帮助
我们可以使用 rvest
来获取数据。
library(rvest)
url <- "https://www.worldometers.info/coronavirus/#countries"
url %>%
read_html() %>%
html_table() %>%
.[[1]] %>%
replace(., . == '', NA)
# Country,Other TotalCases NewCases TotalDeaths NewDeaths TotalRecovered ActiveCases Serious,Critical Tot Cases/1M pop
#1 China 80,894 +13 3,237 11 69,614 8,043 2,622 56
#2 Italy 31,506 <NA> 2,503 NA 2,941 26,062 2,060 521
#3 Iran 16,169 <NA> 988 NA 5,389 9,792 <NA> 193
#4 Spain 11,826 <NA> 533 NA 1,028 10,265 563 253
#5 Germany 9,414 +47 26 NA 71 9,317 2 112
#6 S. Korea 8,413 +93 84 3 1,540 6,789 59 164
#...
您可以查看 readr::parse_number
以将 TotalCases
、NewCases
等列转换为数字格式。