抓取 Spotify 排行榜网页时未使用 'find_all' 获取任何数据条目

Question

我正在尝试在 2022-02-01 抓取包含印度前 200 首歌曲的 spotify 排行榜。我的 python 代码：

#It reads the webpage.
def get_webpage(link):
    page = requests.get(link)
    soup = bs(page.content, 'html.parser')
    return(soup)

#It collects the data for each country, and write them in a list.
#The entries are (in order): Song, Artist, Date, Play Count, Rank
def get_data():
    rows = []
    soup = get_webpage('https://spotifycharts.com/regional/in/daily/2022-02-01')
    entries = soup.find_all("td", class_ = "chart-table-track")
    streams = soup.find_all("td", class_= "chart-table-streams")
    print(entries)
    for i, (entry, stream) in enumerate(zip(entries,streams)):
         song = entry.find('strong').get_text()
         artist = entry.find('span').get_text()[3:]
         play_count = stream.get_text()
         rows.append([song, artist, date, play_count, i+1])
return(rows)

我尝试打印条目和流，但得到的是空白值

entries = soup.find_all("td", class_ = "chart-table-track")
streams = soup.find_all("td", class_= "chart-table-streams")

我有 copied/referenced 这个来自 Here 并尝试了运行完整脚本，但给出了错误：'NoneType' object has no attribute 'find_all' in the country function。因此，我尝试了一个较小的部分，如上所示。

Answer 1

NoneType 提示未找到“条目”或“流”，如果您打印 soup，它将显示为条目和流设置的选择器不存在。

检查您的 soup 对象后，Cloudflare 似乎阻止了您对 Spotify 的访问，您需要完成验证码才能解决此问题。有一个名为“cloudscraper”的绕过 cloudflare 的库。

抓取 Spotify 排行榜网页时未使用 'find_all' 获取任何数据条目

Not getting any data entry with 'find_all' while scraping Spotify Charts webpage

beautifulsoup

spotify

web-scraping

python-3.x

spotipy