从 Wiki 中获取匹配特定文本的表格

Question

我对 Python 和 BeautifulSoup 很陌生，我已经尝试解决这个问题几个小时了...

首先，我想从 link 下方提取标题为“大选”的所有 table 数据：

https://en.wikipedia.org/wiki/Carlow%E2%80%93Kilkenny_(D%C3%A1il_constituency)

我确实有另一个数据框，其中包含每个 table 的名称（例如“1961 年大选”、“1965 年大选”），但我希望通过在每个 table 来确认它是否是我需要的。

然后我想获取所有以粗体显示的名称（表示他们赢了），最后我想要另一个按原始顺序排列的“Count 1”（有时是 1st Pref）列表，我想比较它们到“粗体”列表。我还没看这篇文章，因为我还没过第一关。

url = "https://en.wikipedia.org/wiki/Carlow%E2%80%93Kilkenny_(D%C3%A1il_constituency)"
res = requests.get(url)
soup = BeautifulSoup(res.content,'lxml')
my_tables = soup.find_all("table", {"class":"wikitable"})
for table in my_tables:
    rows = table.find_all('tr', text="general election")
    print(rows)

如有任何帮助，我们将不胜感激...

Answer 1

此页面需要一些技巧，但可以完成：

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

req = requests.get('https://en.wikipedia.org/wiki/Carlow%E2%80%93Kilkenny_(D%C3%A1il_constituency)')
soup = bs(req.text,'lxml')
#first - select all the tables on the page
tables = soup.select('table.wikitable')

for table in tables:        
    ttr = table.select('tbody tr')
    #next, filter out any table that doesn't involve general elections
    if "general election" in ttr[0].text: 
        #clean up the rows
        s_ttr = ttr[1].text.replace('\n','xxx').strip()
        #find and clean up column headings
        columns = [col.strip() for col in s_ttr.split('xxx') if len(col.strip())>0 ]
        rows = [] #initialize a list to house the table rows
        for c in ttr[2:]:
        #from here, start processing each row and loading it into the list
            row = [a.text.strip() if len(a.text.strip())>0 else 'NA' for a in  c.select('td')  ]
            if (row[0])=="NA":
                row=row[1:]
            columns = [col.strip() for col in s_ttr.split('xxx') if len(col.strip())>0 ]
            if len(row)>0:
                rows.append(row)
        #load the whole thing into a dataframe
        df = pd.DataFrame(rows,columns=columns)
        print(df)

输出应该是页面上所有的大选表。

从 Wiki 中获取匹配特定文本的表格

Get tables from Wiki that match specific text

python

wiki

beautifulsoup