使用 rvest 和 XML2 提取的 R 中的 Web 抓取 table

Question

我希望从样本 URL 中提取 table 和 returns https://www.valueresearchonline.com/funds/fundSelector/returns.asp?cat=10&exc=susp%2Cclose&rettab=st

到目前为止尝试过 rvest

#Reading the HTML code from the website
webpage <- read_html(urlString)

#Using CSS selectors to scrap the section
tables <- webpage %>% html_node("tr") %>% html_text()
tables <- html_node(".fundtool_cat") %>% html_text()

我需要一个 dataframe/table，其中包含计划的名称以及排名，returns 涉及所有提到的时期

Answer 1

library(rvest)
urlString <- "https://www.valueresearchonline.com/funds/fundSelector/returns.asp?cat=10&exc=susp%2Cclose&rettab=st"
urlString %>%
  read_html() %>%
  html_nodes(xpath='//*[@id="fundCatData"]/table[1]') %>%
  html_table(fill=T)

使用 rvest 和 XML2 提取的 R 中的 Web 抓取 table

Web scraping in R with rvest and XML2 extract table

r

rvest

xml2