如何在 R 中抓取多个表?
How to scrape multiple tables in R?
当谈到 R 时,我是 "newbie",但我真的很想知道如何从网站上抓取多个表(我不知道其尺寸),例如:
https://en.wikipedia.org/wiki/World_population
(具体来说,下面是 python 中的代码:
from bs4 import BeautifulSoup
import urllib2
url1 = "https://en.wikipedia.org/wiki/World_population"
page = urllib2.urlopen(url1)
soup = BeautifulSoup(page)
table1 = soup.find("table", {'class' : 'wikitable sortable'})
trs = soup.find_all('tr')
tds = soup.find_all('td')
for row in trs:
for column in tds:
a = column.get_text().strip()
print a
break
在 R 中,
u <- "https://en.wikipedia.org/wiki/World_population" # input
library(XML)
b <- basename(u)
download.file(u, b)
L <- readHTMLTable(b)
L
现在是 u
中 29 个表的列表,每个表都是一个 R 数据框。
当谈到 R 时,我是 "newbie",但我真的很想知道如何从网站上抓取多个表(我不知道其尺寸),例如:
https://en.wikipedia.org/wiki/World_population
(具体来说,下面是 python 中的代码:
from bs4 import BeautifulSoup
import urllib2
url1 = "https://en.wikipedia.org/wiki/World_population"
page = urllib2.urlopen(url1)
soup = BeautifulSoup(page)
table1 = soup.find("table", {'class' : 'wikitable sortable'})
trs = soup.find_all('tr')
tds = soup.find_all('td')
for row in trs:
for column in tds:
a = column.get_text().strip()
print a
break
在 R 中,
u <- "https://en.wikipedia.org/wiki/World_population" # input
library(XML)
b <- basename(u)
download.file(u, b)
L <- readHTMLTable(b)
L
现在是 u
中 29 个表的列表,每个表都是一个 R 数据框。