无法使用 BeautifulSoup4 抓取正确的 wikitable(初学者)

Unable to scrape the right wikitable with BeautifulSoup4 (beginner)

这里完全是初学者...我正在尝试从这个 Wikipedia page 中提取成分 table,但是提取的 table 是年度 returns(第一table) 而不是我需要的成分 table (第二个 table)。有人可以帮忙看看是否有任何方法可以使用 BeautifulSoup4 来定位我想要的特定 table?

import bs4 as bs
import pickle
import requests

def save_klci_tickers():
    resp = requests.get ('https://en.wikipedia.org/wiki/FTSE_Bursa_Malaysia_KLCI')
    soup = bs.BeautifulSoup(resp.text)
    table = soup.find ('table', {'class': 'wikitable sortable'})
    tickers = []
    for row in table.findAll ('tr') [1:]:
        ticker = row.findAll ('td') [0].text
        tickers.append(ticker)

    with open ("klcitickers.pickle", "wb") as f:
        pickle.dump (tickers, f)

    print (tickers)
    return tickers


save_klci_tickers()

尝试 pandas 库,眨眼间从 csv 文件中的该页面获取表格数据:

import pandas as pd

url = 'https://en.wikipedia.org/wiki/FTSE_Bursa_Malaysia_KLCI'

df = pd.read_html(url, attrs={"class": "wikitable"})[1] #change the index to get the table you need from that page
new = pd.DataFrame(df, columns=["Constituent Name", "Stock Code", "Sector"])
new.to_csv("wiki_data.csv", index=False)
print(df)

如果您仍然 BeautifulSoup 想坚持下去,以下内容应该可以达到目的:

import requests
from bs4 import BeautifulSoup

res = requests.get("https://en.wikipedia.org/wiki/FTSE_Bursa_Malaysia_KLCI")
soup = BeautifulSoup(res.text,"lxml")
for items in soup.select("table.wikitable")[1].select("tr"):
    data = [item.get_text(strip=True) for item in items.select("th,td")]
    print(data)

如果您想使用 .find_all() 而不是 .select(),请尝试以下操作:

for items in soup.find_all("table",class_="wikitable")[1].find_all("tr"):
    data = [item.get_text(strip=True) for item in items.find_all(["th","td"])]
    print(data)