如何获得下载 link 需要在附加对话框中勾选复选框

How to get a download link which requires checkboxes checking in additional dialog box

我想从 https://sam.gov/data-services/Exclusions/Public%20V2?privacy=Public

下载最后一个公开可用的文件

尝试手动下载时,实际下载 link 看起来像:

https://falextracts.s3.amazonaws.com/Exclusions/Public%20V2/SAM_Exclusions_Public_Extract_V2_22150.ZIP?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20220530T143743Z&X-Amz-SignedHeaders=host&X-Amz-Expires=2699&X-Amz-Credential=AKIAY3LPYEEXWOQWHCIY%2F20220530%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=3eca59f75a4e1f6aa59fc810da8f391f1ebfd8ca5a804d56b79c3eb9c4d82e32

我的函数只得到初始link,它指的是真正的link:

import json
import requests
from operator import itemgetter


files_url = 'https://sam.gov/api/prod/fileextractservices/v1/api/listfiles?random=1653676394983&domain=Exclusions/Public%20V2&privacy=Public'

def get_file():
    response = requests.get(files_url, stream=True)
    links_resp = json.loads(response.text)
    links_dicts = [d for d in links_resp['_embedded']['customS3ObjectSummaryList'] if d['displayKey'].count('SAM_Exclus')]
    sorted_links = sorted(links_dicts, key=itemgetter('dateModified'), reverse=True)
    return sorted_links[0]['_links']['self']['href']

get_file()

结果:

'https://s3.amazonaws.com/falextracts/Exclusions/Public V2/SAM_Exclusions_Public_Extract_V2_22150.ZIP'

但是按照上面的 link,我得到 拒绝访问

所以我会感谢任何关于如何获得真正下载的提示 links

我已经尽可能多地编辑了您的代码,以便您能够理解。请求库可以将其转换为 json 本身。

不在代码开头的导入看起来不太适合阅读...

import requests as req
from operator import itemgetter

files_url = "https://sam.gov/api/prod/fileextractservices/v1/api/listfiles?random=1653676394983&domain=Exclusions/Public%20V2&privacy=Public"
down_url = "https://sam.gov/api/prod/fileextractservices/v1/api/download/Exclusions/Public%20V2/{}?privacy=Public"

def get_file():
    response = req.get(files_url, stream=True).json()

    links_dicts = [d for d in response["_embedded"]["customS3ObjectSummaryList"]]
    sorted_links = sorted(links_dicts, key=itemgetter('dateModified'), reverse=True)

    key = sorted_links[0]['displayKey']
    
    down = req.get(down_url.format(key))

    if not down.status_code == 200:
        return False

    print(key)
    open(key, 'wb').write(down.content)
    
    return True

get_file()