没有数据抓取 w/Rvest 包?
No Data Scraped w/Rvest package?
# Load Packages
pacman::p_load(tidyverse, rvest)
# Set URL
url <- "https://www.worldometers.info/coronavirus/"
website <- read_html(url)
# Scrape Cases Data
cases_html <- html_nodes(website, "td.sorting_1")
cases <- html_text(cases_html)
cases_html
cases
我正在尝试使用 rvest 抓取网络数据,但是当我在此处检查我的两个变量(“cases_html”和“cases”)时出现以下错误。每个的输出分别是:
> {xml_nodeset (0)}
> character(0)
我不确定为什么我没有从该网站抓取任何数据。我也尝试过使用 RSelenium 包,就像在此处的另一个 post 中推荐的那样,但该代码也因不相关的错误而失败。不过,我认为该解决方案应该在 Rvest 中可用,我想弄清楚这里到底出了什么问题。
不清楚您要从页面中抓取什么,但是您可以像这样获取主要数据table:
library(tidyverse)
library(rvest)
read_html("https://www.worldometers.info/coronavirus/") %>%
html_nodes("#main_table_countries_today") %>%
html_table() %>%
pluck(1)
#> # A tibble: 244 x 22
#> `#` `Country,Other` TotalCases NewCases TotalDeaths NewDeaths
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 NA "North America" 98,313,200 "+23,167" 1,459,752 "+147"
#> 2 NA "Asia" 147,921,193 "+130,458" 1,423,876 "+395"
#> 3 NA "South America" 56,801,380 "+17,492" 1,294,318 "+31"
#> 4 NA "Europe" 191,122,646 "+187,856" 1,817,850 "+587"
#> 5 NA "Oceania" 7,156,060 "+46,381" 10,626 "+59"
#> 6 NA "Africa" 11,902,057 "+6,666" 253,795 "+4"
#> 7 NA "" 721 "" 15 ""
#> 8 NA "World" 513,217,257 "+412,020" 6,260,232 "+1,223"
#> 9 1 "USA" 83,055,836 "+18,777" 1,020,749 "+89"
#> 10 2 "India" 43,079,157 "+3,293" 523,803 ""
#> # ... with 234 more rows, and 16 more variables: TotalRecovered <chr>,
#> # NewRecovered <chr>, ActiveCases <chr>, `Serious,Critical` <chr>,
#> # `Tot Cases/1M pop` <chr>, `Deaths/1M pop` <chr>, TotalTests <chr>,
#> # `Tests/1M pop` <chr>, Population <chr>, Continent <chr>,
#> # `1 Caseevery X ppl` <chr>, `1 Deathevery X ppl` <chr>,
#> # `1 Testevery X ppl` <int>, `New Cases/1M pop` <chr>,
#> # `New Deaths/1M pop` <dbl>, `Active Cases/1M pop` <chr>
由 reprex package (v2.0.1)
于 2022-04-30 创建
# Load Packages
pacman::p_load(tidyverse, rvest)
# Set URL
url <- "https://www.worldometers.info/coronavirus/"
website <- read_html(url)
# Scrape Cases Data
cases_html <- html_nodes(website, "td.sorting_1")
cases <- html_text(cases_html)
cases_html
cases
我正在尝试使用 rvest 抓取网络数据,但是当我在此处检查我的两个变量(“cases_html”和“cases”)时出现以下错误。每个的输出分别是:
> {xml_nodeset (0)}
> character(0)
我不确定为什么我没有从该网站抓取任何数据。我也尝试过使用 RSelenium 包,就像在此处的另一个 post 中推荐的那样,但该代码也因不相关的错误而失败。不过,我认为该解决方案应该在 Rvest 中可用,我想弄清楚这里到底出了什么问题。
不清楚您要从页面中抓取什么,但是您可以像这样获取主要数据table:
library(tidyverse)
library(rvest)
read_html("https://www.worldometers.info/coronavirus/") %>%
html_nodes("#main_table_countries_today") %>%
html_table() %>%
pluck(1)
#> # A tibble: 244 x 22
#> `#` `Country,Other` TotalCases NewCases TotalDeaths NewDeaths
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 NA "North America" 98,313,200 "+23,167" 1,459,752 "+147"
#> 2 NA "Asia" 147,921,193 "+130,458" 1,423,876 "+395"
#> 3 NA "South America" 56,801,380 "+17,492" 1,294,318 "+31"
#> 4 NA "Europe" 191,122,646 "+187,856" 1,817,850 "+587"
#> 5 NA "Oceania" 7,156,060 "+46,381" 10,626 "+59"
#> 6 NA "Africa" 11,902,057 "+6,666" 253,795 "+4"
#> 7 NA "" 721 "" 15 ""
#> 8 NA "World" 513,217,257 "+412,020" 6,260,232 "+1,223"
#> 9 1 "USA" 83,055,836 "+18,777" 1,020,749 "+89"
#> 10 2 "India" 43,079,157 "+3,293" 523,803 ""
#> # ... with 234 more rows, and 16 more variables: TotalRecovered <chr>,
#> # NewRecovered <chr>, ActiveCases <chr>, `Serious,Critical` <chr>,
#> # `Tot Cases/1M pop` <chr>, `Deaths/1M pop` <chr>, TotalTests <chr>,
#> # `Tests/1M pop` <chr>, Population <chr>, Continent <chr>,
#> # `1 Caseevery X ppl` <chr>, `1 Deathevery X ppl` <chr>,
#> # `1 Testevery X ppl` <int>, `New Cases/1M pop` <chr>,
#> # `New Deaths/1M pop` <dbl>, `Active Cases/1M pop` <chr>
由 reprex package (v2.0.1)
于 2022-04-30 创建