Beautifulsoup 获取嵌套跨度元素时遇到问题

Beautifulsoup trouble getting nested span elements

我正在使用 Python 和 BS4,我可以从页面中获取顶部条目,但我希望获取所有条目。

cardAttr = soup.find(class_='box_card_attribute').find("span", {"class": False}).text

cardAttr = soup.select_one('span.box_card_attribute >span').text

以上两个都会给我第一次迭代,但尝试使用 find_all 会给我一个 AttributeError。以下是 HTML.

的片段
    <div id="card_list" class="list">
                    <div class="t_row c_normal">
                        <div class="box_card_img">
                            <img id="card_image_0_1" alt="Tri-Horned Dragon" title="Tri-Horned Dragon" class="none">
                        </div>
                        <dl class="flex_1">
                            <dd class="box_card_name flex_1 top_set">
                                <span class="card_ruby"></span>
                                <span class="card_name">Tri-Horned Dragon</span>
                            </dd>
                            <dd class="icon flex_1 top_set">
                                <div class="lr_icon rid rid_5" style="background-color:#e86d6d;color:#e86d6d">
                                    <p>SE</p>
                                    <span style="background-color:#fff4f4;border-color:#e86d6d;color:#e86d6d; ">
                                            Secret Rare
                                    </span>
                                </div>
                            </dd>
                            <dd class="remove_btn top_set">
                                <a href="javascript:void(0);" class="btn hex red"  title="Remove this card from the list.">
                                    <span>X</span>
                                    <input type="hidden" class="lang" value="">
                                    <input type="hidden" class="cid" value="4711">
                                </a>
                            </dd>
                            <dd class="box_card_spec flex_1">
    
                                <span class="box_card_attribute">
                                    <img class="icon_img" src="external/image/parts/attribute/attribute_icon_dark.png" alt="DARK" title="DARK">
                                    <span>DARK</span>
                                </span>

目前我可以抓取 'DARK' 文本,但我似乎无法像使用 class=card_name 那样将其抓取到整个页面 运行。

如果需要,这就是我正在查看的 url。

https://www.db.yugioh-card.com/yugiohdb/card_search.action?ope=1&sess=1&pid=11101000&rp=99999

要获取所有卡片标题+它们的属性和文本,您可以使用下一个示例:

import re
import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://www.db.yugioh-card.com/yugiohdb/card_search.action?ope=1&sess=1&pid=11101000&rp=99999"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

out = []
for t in soup.select(".t_row"):
    title = t.select_one(".card_name").get_text(strip=True)
    attrs = {
        s["class"][0]: re.sub(r"\s{2,}", "", s.get_text(strip=True))
        for s in t.select(".box_card_spec > span")
    }
    text = t.select_one(".box_card_text").get_text(strip=True)
    out.append({"title": title, **attrs, "text": text})

df = pd.DataFrame(out).fillna("")
print(df.head().to_markdown())
df.to_csv("data.csv", index=False)

打印:

title box_card_attribute box_card_level_rank card_info_species_and_other_item atk_power def_power text box_card_effect
0 Tri-Horned Dragon DARK Level 8 [Dragon/Normal] ATK 2850 DEF 2350 An unworthy dragon with three sharp horns sprouting from its head.
1 Blue-Eyes White Dragon LIGHT Level 8 [Dragon/Normal] ATK 3000 DEF 2500 This legendary dragon is a powerful engine of destruction. Virtually invincible, very few have faced this awesome creature and lived to tell the tale.
2 Hitotsu-Me Giant EARTH Level 4 [Beast-Warrior/Normal] ATK 1200 DEF 1000 A one-eyed behemoth with thick, powerful arms made for delivering punishing blows.
3 Flame Swordsman FIRE Level 5 [Warrior/Fusion] ATK 1800 DEF 1600 "Flame Manipulator" + "Masaki the Legendary Swordsman"
4 Skull Servant DARK Level 1 [Zombie/Normal] ATK 300 DEF 200 A skeletal ghost that isn't strong but can mean trouble in large numbers.

并保存 data.csv(来自 LibreOffice 的屏幕截图):