没有数据抓取 w/Rvest 包？

Question

# Load Packages
pacman::p_load(tidyverse, rvest)

# Set URL
url <- "https://www.worldometers.info/coronavirus/"
website <- read_html(url)

# Scrape Cases Data
cases_html <- html_nodes(website, "td.sorting_1")
cases <- html_text(cases_html)

cases_html
cases

我正在尝试使用 rvest 抓取网络数据，但是当我在此处检查我的两个变量（“cases_html”和“cases”）时出现以下错误。每个的输出分别是：

> {xml_nodeset (0)}

> character(0)

我不确定为什么我没有从该网站抓取任何数据。我也尝试过使用 RSelenium 包，就像在此处的另一个 post 中推荐的那样，但该代码也因不相关的错误而失败。不过，我认为该解决方案应该在 Rvest 中可用，我想弄清楚这里到底出了什么问题。

Answer 1

不清楚您要从页面中抓取什么，但是您可以像这样获取主要数据table：

library(tidyverse)
library(rvest)

read_html("https://www.worldometers.info/coronavirus/") %>%
  html_nodes("#main_table_countries_today") %>%
  html_table() %>%
  pluck(1)
#> # A tibble: 244 x 22
#>      `#` `Country,Other` TotalCases  NewCases   TotalDeaths NewDeaths
#>    <int> <chr>           <chr>       <chr>      <chr>       <chr>    
#>  1    NA "North America" 98,313,200  "+23,167"  1,459,752   "+147"   
#>  2    NA "Asia"          147,921,193 "+130,458" 1,423,876   "+395"   
#>  3    NA "South America" 56,801,380  "+17,492"  1,294,318   "+31"    
#>  4    NA "Europe"        191,122,646 "+187,856" 1,817,850   "+587"   
#>  5    NA "Oceania"       7,156,060   "+46,381"  10,626      "+59"    
#>  6    NA "Africa"        11,902,057  "+6,666"   253,795     "+4"     
#>  7    NA ""              721         ""         15          ""       
#>  8    NA "World"         513,217,257 "+412,020" 6,260,232   "+1,223" 
#>  9     1 "USA"           83,055,836  "+18,777"  1,020,749   "+89"    
#> 10     2 "India"         43,079,157  "+3,293"   523,803     ""       
#> # ... with 234 more rows, and 16 more variables: TotalRecovered <chr>,
#> #   NewRecovered <chr>, ActiveCases <chr>, `Serious,Critical` <chr>,
#> #   `Tot Cases/1M pop` <chr>, `Deaths/1M pop` <chr>, TotalTests <chr>,
#> #   `Tests/1M pop` <chr>, Population <chr>, Continent <chr>,
#> #   `1 Caseevery X ppl` <chr>, `1 Deathevery X ppl` <chr>,
#> #   `1 Testevery X ppl` <int>, `New Cases/1M pop` <chr>,
#> #   `New Deaths/1M pop` <dbl>, `Active Cases/1M pop` <chr>

^{由 reprex package (v2.0.1)}

于 2022-04-30 创建

没有数据抓取 w/Rvest 包？

No Data Scraped w/Rvest package?

html

r

rvest