Web 抓取惰性列表（延迟加载）使用 python 请求（没有 selenium/scarpy）

Question

我为自己编写了一个简单的脚本作为练习，用于查找谁在 bandcamp 上购买了与我相同的曲目，以便理想地找到具有相似品味的帐户，因此他们的帐户中有更多相同的音乐。

问题是 album/track 页面上的粉丝列表是延迟加载的。使用 python 的 requests 和 bs4 我只能从潜在的 700 个结果中得到 60 个结果。

我正在尝试弄清楚如何发送请求，即在此处 https://pitp.bandcamp.com/album/fragments-distancing 打开更多列表。当我在查找器中单击时找到发送的请求后，我使用 json 请求使用 requests 发送它，尽管没有任何结果

res = requests.get(track_link)
    open_more = {"tralbum_type":"a","tralbum_id":3542956135,"token":"1:1609185066:1714678:0:1:0","count":100}
    for i in range(0,3):
        requests.post(track_link, json=open_more)

感谢任何帮助！

Answer 1

我认为只需输入一个可笑的数字即可。如果你想获取其他专辑的数据，我也对你的脚本做了一些自动化处理

from urllib.parse import urlsplit
import json

import requests
from bs4 import BeautifulSoup

# build the post link
get_link="https://pitp.bandcamp.com/album/fragments-distancing"
link=urlsplit(get_link)
base_link=f'{link.scheme}://{link.netloc}'
post_link=f"{base_link}/api/tralbumcollectors/2/thumbs"

with requests.session() as s:
    res = s.get(get_link)
    soup = BeautifulSoup(res.text, 'lxml')

    # the data for tralbum_type and tralbum_id
    # are stored in a script attribute
    key="data-band-follow-info"
    data=soup.select_one(f'script[{key}]')[key]
    data=json.loads(data)
    open_more = {
        "tralbum_type":data["tralbum_type"],
        "tralbum_id":data["tralbum_id"],
        "count":1000}
        
    r=s.post(post_link, json=open_more).json()
    print(r['more_available']) # if not false put a bigger count

Web 抓取惰性列表（延迟加载）使用 python 请求（没有 selenium/scarpy）

Web scraping lazy list (lazy loading) using python request (without selenium/scarpy)

python

lazy-loading

beautifulsoup

python-requests