使用漂亮的汤从广告牌排名前 100 的网站中检索艺术家姓名时遇到问题

Question

我正在尝试使用 python 程序包 BeautifulSoup 从 url 中检索最流行的歌曲。当我去获取带有艺术家姓名的 span 时，它会获取正确的 span，但是当我在 span 上调用“.text”时，它不会获取 span 标签之间的文本。

这是我的代码：

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.billboard.com/charts/hot-100/')
soup = BeautifulSoup(r.content, 'html.parser')
result = soup.find_all('div', class_='o-chart-results-list-row-container')
for res in result:
    songName = res.find('h3').text.strip()
    artist = res.find('span',class_='c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only').text
    print("song: "+songName)
    print("artist: "+ str(artist))
    print("___________________________________________________")

当前每首歌曲打印以下内容：

song: Waiting On A Miracle
artist: <span class="c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only">

        Stephanie Beatriz
</span>
___________________________________________________

如何只提取艺术家的名字？

Answer 1

如果 class 中有一个字符丢失，它不会捕捉到它。我只是通过获得歌曲标题来简化它，艺术家在下一个 <span> 标签中跟随。因此，像为歌曲所做的那样获取 <h3> 标签，然后使用 .find_next() 获取艺术家：

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.billboard.com/charts/hot-100/')
soup = BeautifulSoup(r.content, 'html.parser')
result = soup.find_all('div', class_='o-chart-results-list-row-container')
for res in result:
    songName = res.find('h3').text.strip()
    artist = res.find('h3').find_next('span').text.strip()
    print("song: "+songName)
    print("artist: "+ str(artist))
    print("___________________________________________________")

输出：

song: Heat Waves
artist: Glass Animals
___________________________________________________
song: Stay
artist: The Kid LAROI & Justin Bieber
___________________________________________________
song: Super Gremlin
artist: Kodak Black
___________________________________________________
song: abcdefu
artist: GAYLE
___________________________________________________
song: Ghost
artist: Justin Bieber
___________________________________________________
song: We Don't Talk About Bruno
artist: Carolina Gaitan, Mauro Castillo, Adassa, Rhenzy Feliz, Diane Guerrero, Stephanie Beatriz & Encanto Cast
___________________________________________________
song: Enemy
artist: Imagine Dragons X JID
___________________________________________________

....

使用漂亮的汤从广告牌排名前 100 的网站中检索艺术家姓名时遇到问题

Trouble retrieving artist name from billboard top 100 site using beautiful soup

html

python

beautifulsoup

web-scraping

python-requests