Web 在 R 中抓取 HTML table 花费大量时间

Question

各位，我正在尝试抓取一个 link，它只有大约 1000 多条记录，但要花几个小时才能得到它们..想知道我是否做错了什么或加载方法这变成了 table.

urlString = "https://www.valueresearchonline.com/funds/selector-data/primary-category/1/equity/?tab=snapshot&output=html-data"
urlString <- URLencode(paste0(urlString,""))

#Reading the HTML code from the website and process the text
getHTML <- xml2::read_html(urlString, options = "HUGE")

#This one keeps running endlessly and doesn't load the table
mytable <- data.frame(getHTML %>% html_table(fill = T, trim = T))

如有任何帮助，我们将不胜感激。谢谢

Answer 1

link 是一个 JSON 文件。您需要先阅读jsonlite。而HTML数据在html_data节点，你通过read_html:

读取这个节点

json <- jsonlite::fromJSON("https://www.valueresearchonline.com/funds/selector-data/primary-category/1/equity/?tab=snapshot&output=html-data")
getHTML <- xml2::read_html(json$html_data)
mytable <- data.frame(getHTML %>% html_table(fill = T, trim = T))

Web 在 R 中抓取 HTML table 花费大量时间

Web scraping an HTML table in R taking huge time

r

web-scraping

rvest

xml2