如何从 wiki 获取特定列 table
How to get a specific columns from a wiki table
基本上我在这个页面上有table:https://en.wikipedia.org/wiki/List_of_cakes
我想从第一、第三和第四列中获取文本并将它们格式化为如下所示:
Amandine - 罗马尼亚 - 夹心巧克力、焦糖和软糖奶油的巧克力夹层蛋糕
到目前为止,我有这段代码是我根据 post:How do I extract text data in first column from Wikipedia table?.
修改的
from bs4 import BeautifulSoup
url = "https://en.wikipedia.org/wiki/List_of_cakes"
res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
for items in soup.find(class_="wikitable").find_all("tr")[1:]:
data = items.get_text(strip=True)
print(data)
输出
AmandineRomaniaChocolate layered cake filled with chocolate, caramel and fondant cream
AmygdalopitaGreeceAlmond cake made with ground almonds, flour, butter, egg and pastry cream
Angel cakeUnited Kingdom[1]Sponge cake,cream,food colouring
Angel food cakeUnited StatesEgg whites, vanilla, andcream of tartar
etc...
我只是想抓取这个 wiki 页面并获得这些内容的文本文件,所以如果有人在我的 twitch 上使用命令 !cake 它会随机选择一个。
您离目标很近了,只有 find_all('td')
在您的行中,并按索引从 ResulSet
:
中选择
for items in soup.find(class_="wikitable").find_all("tr")[1:]:
e = items.find_all('td')
data = f'{e[0].text.strip()} - {e[2].text.strip()} - {e[3].text.strip()}'
print(data)
或使用list comprehension
:
for items in soup.find(class_="wikitable").find_all("tr")[1:]:
print(' - '.join([items.find_all('td')[i].get_text(strip=True) for i in [0,2,3]]))
例子
from bs4 import BeautifulSoup
url = "https://en.wikipedia.org/wiki/List_of_cakes"
res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
for items in soup.find(class_="wikitable").find_all("tr")[1:]:
e = items.find_all('td')
data = f'{e[0].text.strip()} - {e[2].text.strip()} - {e[3].text.strip()}'
print(data)
输出
Amandine - Romania - Chocolate layered cake filled with chocolate, caramel and fondant cream
Amygdalopita - Greece - Almond cake made with ground almonds, flour, butter, egg and pastry cream
Angel cake - United Kingdom[1] - Sponge cake, cream, food colouring
Angel food cake - United States - Egg whites, vanilla, and cream of tartar
Apple cake - Germany - Apple, caramel icing
Applesauce cake - Early colonial times in the New England Colonies of the Northeastern United States[2] - Prepared using apple sauce, flour and sugar as primary ingredients
Aranygaluska - Hungary - A cake with yeasty dough and vanilla custard
基本上我在这个页面上有table:https://en.wikipedia.org/wiki/List_of_cakes 我想从第一、第三和第四列中获取文本并将它们格式化为如下所示:
Amandine - 罗马尼亚 - 夹心巧克力、焦糖和软糖奶油的巧克力夹层蛋糕
到目前为止,我有这段代码是我根据 post:How do I extract text data in first column from Wikipedia table?.
修改的from bs4 import BeautifulSoup
url = "https://en.wikipedia.org/wiki/List_of_cakes"
res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
for items in soup.find(class_="wikitable").find_all("tr")[1:]:
data = items.get_text(strip=True)
print(data)
输出
AmandineRomaniaChocolate layered cake filled with chocolate, caramel and fondant cream
AmygdalopitaGreeceAlmond cake made with ground almonds, flour, butter, egg and pastry cream
Angel cakeUnited Kingdom[1]Sponge cake,cream,food colouring
Angel food cakeUnited StatesEgg whites, vanilla, andcream of tartar
etc...
我只是想抓取这个 wiki 页面并获得这些内容的文本文件,所以如果有人在我的 twitch 上使用命令 !cake 它会随机选择一个。
您离目标很近了,只有 find_all('td')
在您的行中,并按索引从 ResulSet
:
for items in soup.find(class_="wikitable").find_all("tr")[1:]:
e = items.find_all('td')
data = f'{e[0].text.strip()} - {e[2].text.strip()} - {e[3].text.strip()}'
print(data)
或使用list comprehension
:
for items in soup.find(class_="wikitable").find_all("tr")[1:]:
print(' - '.join([items.find_all('td')[i].get_text(strip=True) for i in [0,2,3]]))
例子
from bs4 import BeautifulSoup
url = "https://en.wikipedia.org/wiki/List_of_cakes"
res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
for items in soup.find(class_="wikitable").find_all("tr")[1:]:
e = items.find_all('td')
data = f'{e[0].text.strip()} - {e[2].text.strip()} - {e[3].text.strip()}'
print(data)
输出
Amandine - Romania - Chocolate layered cake filled with chocolate, caramel and fondant cream
Amygdalopita - Greece - Almond cake made with ground almonds, flour, butter, egg and pastry cream
Angel cake - United Kingdom[1] - Sponge cake, cream, food colouring
Angel food cake - United States - Egg whites, vanilla, and cream of tartar
Apple cake - Germany - Apple, caramel icing
Applesauce cake - Early colonial times in the New England Colonies of the Northeastern United States[2] - Prepared using apple sauce, flour and sugar as primary ingredients
Aranygaluska - Hungary - A cake with yeasty dough and vanilla custard