使用漂亮的汤从广告牌排名前 100 的网站中检索艺术家姓名时遇到问题
Trouble retrieving artist name from billboard top 100 site using beautiful soup
我正在尝试使用 python 程序包 BeautifulSoup 从 url 中检索最流行的歌曲。当我去获取带有艺术家姓名的 span 时,它会获取正确的 span,但是当我在 span 上调用“.text”时,它不会获取 span 标签之间的文本。
这是我的代码:
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.billboard.com/charts/hot-100/')
soup = BeautifulSoup(r.content, 'html.parser')
result = soup.find_all('div', class_='o-chart-results-list-row-container')
for res in result:
songName = res.find('h3').text.strip()
artist = res.find('span',class_='c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only').text
print("song: "+songName)
print("artist: "+ str(artist))
print("___________________________________________________")
当前每首歌曲打印以下内容:
song: Waiting On A Miracle
artist: <span class="c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only">
Stephanie Beatriz
</span>
___________________________________________________
如何只提取艺术家的名字?
如果 class 中有一个字符丢失,它不会捕捉到它。我只是通过获得歌曲标题来简化它,艺术家在下一个 <span>
标签中跟随。因此,像为歌曲所做的那样获取 <h3>
标签,然后使用 .find_next()
获取艺术家:
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.billboard.com/charts/hot-100/')
soup = BeautifulSoup(r.content, 'html.parser')
result = soup.find_all('div', class_='o-chart-results-list-row-container')
for res in result:
songName = res.find('h3').text.strip()
artist = res.find('h3').find_next('span').text.strip()
print("song: "+songName)
print("artist: "+ str(artist))
print("___________________________________________________")
输出:
song: Heat Waves
artist: Glass Animals
___________________________________________________
song: Stay
artist: The Kid LAROI & Justin Bieber
___________________________________________________
song: Super Gremlin
artist: Kodak Black
___________________________________________________
song: abcdefu
artist: GAYLE
___________________________________________________
song: Ghost
artist: Justin Bieber
___________________________________________________
song: We Don't Talk About Bruno
artist: Carolina Gaitan, Mauro Castillo, Adassa, Rhenzy Feliz, Diane Guerrero, Stephanie Beatriz & Encanto Cast
___________________________________________________
song: Enemy
artist: Imagine Dragons X JID
___________________________________________________
....
我正在尝试使用 python 程序包 BeautifulSoup 从 url 中检索最流行的歌曲。当我去获取带有艺术家姓名的 span 时,它会获取正确的 span,但是当我在 span 上调用“.text”时,它不会获取 span 标签之间的文本。
这是我的代码:
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.billboard.com/charts/hot-100/')
soup = BeautifulSoup(r.content, 'html.parser')
result = soup.find_all('div', class_='o-chart-results-list-row-container')
for res in result:
songName = res.find('h3').text.strip()
artist = res.find('span',class_='c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only').text
print("song: "+songName)
print("artist: "+ str(artist))
print("___________________________________________________")
当前每首歌曲打印以下内容:
song: Waiting On A Miracle
artist: <span class="c-label a-no-trucate a-font-primary-s lrv-u-font-size-14@mobile-max u-line-height-normal@mobile-max u-letter-spacing-0021 lrv-u-display-block a-truncate-ellipsis-2line u-max-width-330 u-max-width-230@tablet-only">
Stephanie Beatriz
</span>
___________________________________________________
如何只提取艺术家的名字?
如果 class 中有一个字符丢失,它不会捕捉到它。我只是通过获得歌曲标题来简化它,艺术家在下一个 <span>
标签中跟随。因此,像为歌曲所做的那样获取 <h3>
标签,然后使用 .find_next()
获取艺术家:
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.billboard.com/charts/hot-100/')
soup = BeautifulSoup(r.content, 'html.parser')
result = soup.find_all('div', class_='o-chart-results-list-row-container')
for res in result:
songName = res.find('h3').text.strip()
artist = res.find('h3').find_next('span').text.strip()
print("song: "+songName)
print("artist: "+ str(artist))
print("___________________________________________________")
输出:
song: Heat Waves
artist: Glass Animals
___________________________________________________
song: Stay
artist: The Kid LAROI & Justin Bieber
___________________________________________________
song: Super Gremlin
artist: Kodak Black
___________________________________________________
song: abcdefu
artist: GAYLE
___________________________________________________
song: Ghost
artist: Justin Bieber
___________________________________________________
song: We Don't Talk About Bruno
artist: Carolina Gaitan, Mauro Castillo, Adassa, Rhenzy Feliz, Diane Guerrero, Stephanie Beatriz & Encanto Cast
___________________________________________________
song: Enemy
artist: Imagine Dragons X JID
___________________________________________________
....