Rvest R 没有变得内在 table
Rvest R not getting inner table
我正在尝试检索维基百科中的 2012 年奥运会奖牌 Table。
library(rvest)
library(magrittr)
url <- "https://en.wikipedia.org/wiki/United_States_at_the_2012_Summer_Olympics"
xpath0 <- '//*[@id="mw-content-text"]/table[1]'
xpath1 <- '//*[@id="mw-content-text"]/table[2]'
xpath2 <- '//*[@id="mw-content-text"]/table[2]/tbody/tr/td[1]'
xpath3 <- '//*[@id="mw-content-text"]/table[2]/tbody/tr/td[1]/table'
tb <- url %>%
html() %>%
html_nodes(xpath=xpath0) %>%
html_nodes("") %>%
html_table()
xpath0 或 xpath1 return 一个错误
Error in parse_simple_selector(stream) :
Expected selector, got <EOF at 1>
xpath2 和 xpath3 return 个空列表。
同时我尝试使用 Selectorgadget (https://cran.r-project.org/web/packages/rvest/vignettes/selectorgadget.html) 来指向确切的元素。我得到了
//td[(((count(preceding-sibling::) + 1) = 1) and parent::)] |
//*[contains(concat( " ", @class, " " ), concat( " ",
"headerSortDown", " " ))]
和错误
parse_simple_selector(流)中的错误:
预期的选择器,得到
非常感谢任何帮助。
乔阿
第一个table的名字结构复杂,似乎很难转换成标准格式。至少我没有成功。
可以通过
获得按运动项目奖牌数量和奖牌总数的汇总
library(rvest) #v.0.2.0.9000
url <- "https://en.wikipedia.org/wiki/United_States_at_the_2012_Summer_Olympics"
tb <- read_html(url) %>% html_node("table.wikitable:nth-child(2)") %>% html_table(fill=TRUE)
#> head(tb)
# Medals by sport NA NA NA NA NA NA
#1 Sport 01 ! 02 ! 03 ! Total NA NA
#2 Swimming 16 9 6 31 NA NA
#3 Track & field 9 12 7 28 NA NA
#4 Gymnastics 3 1 2 6 NA NA
#5 Shooting 3 0 1 4 NA NA
#6 Tennis 3 0 1 4 NA NA
然后还有另一个 table 总结了您可以通过
获得的所有竞争对手
tb2 <- read_html(url) %>% html_node("table.wikitable:nth-child(20)") %>% html_table()
#> head(tb2)
# Sport Men Women Total
#1 Archery 3 3 6
#2 Athletics (track and field) 63 62 125
#3 Badminton 2 1 3
#4 Basketball 12 12 24
#5 Boxing 9 3 12
#6 Canoeing 5 2 7
这是 table 多个奖牌获得者:
tb3 <- read_html(url) %>% html_node("table.wikitable:nth-child(8)") %>% html_table(fill=TRUE)
#> head(tb3)
# Multiple medalists NA NA NA NA NA NA
#1 Name Sport 01 ! 02 ! 03 ! Total NA
#2 Michael Phelps Swimming 4 2 0 6 NA
#3 Missy Franklin Swimming 4 0 1 5 NA
#4 Allison Schmitt Swimming 3 1 1 5 NA
#5 Ryan Lochte Swimming 2 2 1 5 NA
#6 Allyson Felix Track & field 3 0 0 3 NA
正如@Metrics 所指出的,这实际上取决于您想要table。该页面上有很多 table。
我正在尝试检索维基百科中的 2012 年奥运会奖牌 Table。
library(rvest)
library(magrittr)
url <- "https://en.wikipedia.org/wiki/United_States_at_the_2012_Summer_Olympics"
xpath0 <- '//*[@id="mw-content-text"]/table[1]'
xpath1 <- '//*[@id="mw-content-text"]/table[2]'
xpath2 <- '//*[@id="mw-content-text"]/table[2]/tbody/tr/td[1]'
xpath3 <- '//*[@id="mw-content-text"]/table[2]/tbody/tr/td[1]/table'
tb <- url %>%
html() %>%
html_nodes(xpath=xpath0) %>%
html_nodes("") %>%
html_table()
xpath0 或 xpath1 return 一个错误
Error in parse_simple_selector(stream) :
Expected selector, got <EOF at 1>
xpath2 和 xpath3 return 个空列表。
同时我尝试使用 Selectorgadget (https://cran.r-project.org/web/packages/rvest/vignettes/selectorgadget.html) 来指向确切的元素。我得到了
//td[(((count(preceding-sibling::) + 1) = 1) and parent::)] | //*[contains(concat( " ", @class, " " ), concat( " ", "headerSortDown", " " ))]
和错误
parse_simple_selector(流)中的错误: 预期的选择器,得到
非常感谢任何帮助。
乔阿
第一个table的名字结构复杂,似乎很难转换成标准格式。至少我没有成功。
可以通过
获得按运动项目奖牌数量和奖牌总数的汇总library(rvest) #v.0.2.0.9000
url <- "https://en.wikipedia.org/wiki/United_States_at_the_2012_Summer_Olympics"
tb <- read_html(url) %>% html_node("table.wikitable:nth-child(2)") %>% html_table(fill=TRUE)
#> head(tb)
# Medals by sport NA NA NA NA NA NA
#1 Sport 01 ! 02 ! 03 ! Total NA NA
#2 Swimming 16 9 6 31 NA NA
#3 Track & field 9 12 7 28 NA NA
#4 Gymnastics 3 1 2 6 NA NA
#5 Shooting 3 0 1 4 NA NA
#6 Tennis 3 0 1 4 NA NA
然后还有另一个 table 总结了您可以通过
获得的所有竞争对手tb2 <- read_html(url) %>% html_node("table.wikitable:nth-child(20)") %>% html_table()
#> head(tb2)
# Sport Men Women Total
#1 Archery 3 3 6
#2 Athletics (track and field) 63 62 125
#3 Badminton 2 1 3
#4 Basketball 12 12 24
#5 Boxing 9 3 12
#6 Canoeing 5 2 7
这是 table 多个奖牌获得者:
tb3 <- read_html(url) %>% html_node("table.wikitable:nth-child(8)") %>% html_table(fill=TRUE)
#> head(tb3)
# Multiple medalists NA NA NA NA NA NA
#1 Name Sport 01 ! 02 ! 03 ! Total NA
#2 Michael Phelps Swimming 4 2 0 6 NA
#3 Missy Franklin Swimming 4 0 1 5 NA
#4 Allison Schmitt Swimming 3 1 1 5 NA
#5 Ryan Lochte Swimming 2 2 1 5 NA
#6 Allyson Felix Track & field 3 0 0 3 NA
正如@Metrics 所指出的,这实际上取决于您想要table。该页面上有很多 table。