BeautifulSoup

Question

我尝试了此处提到的所有解决方案，但 none 对我的代码有效。我的问题是我只想从这个维基百科页面 (https://fr.wikipedia.org/wiki/Manga) 上的 h2 标签（而不是 h3 标签）的子标签 span 标签中获取文本这是我的代码：

import numbers
import urllib.request
from bs4 import BeautifulSoup 
quote_page ='https://fr.wikipedia.org/wiki/Manga#:~:text=Un%20manga%20(%E6%BC%AB%E7%94%BB)%20est%20une,quelle%20que%20soit%20son%20origine.'
page = urllib.request.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')

spans = soup.find_all('h2 > span.mw-heading') 
#not working, results show all spans in h2 AND h3 
for span in spans :
    print(span.text)


#div_span = soup.find_all('span', class_="mw-headline") 
#for spans in div_span:
#    print(spans.text) #or string ?

今天有人有解决方案吗，我会感谢他；）（评论有效，但使用带有 h3 标签的 span 标签：/）

Answer 1

你接近你的目标，但在我看来混合了一些东西，应该在使用 css selectors:

操作时使用 select

soup.select('h2 > span.mw-headline')

这里的另一个问题是 class 被命名为 mw-headline 而不是 mw-heading.

例子

import urllib.request
from bs4 import BeautifulSoup 
quote_page ='https://fr.wikipedia.org/wiki/Manga#:~:text=Un%20manga%20(%E6%BC%AB%E7%94%BB)%20est%20une,quelle%20que%20soit%20son%20origine.'
page = urllib.request.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')

for e in soup.select('h2 > span.mw-headline'):
    print(e.text)

输出

Étymologie
Genre et nombre du mot « manga » en français
Histoire des mangas
Caractéristiques du manga
Diffusion
Influence du manga
Produits dérivés
Notes et références
Voir aussi

BeautifulSoup - 无法通过 CSS 选择器找到子标签

BeautifulSoup - Can't find children tags by CSS selectors

css

python

css-selectors

例子

输出