如何获取 url 的内容并使用 Python 中的 HTMLSession 写入新文件?

How can I get the content of an url and write into new file using HTMLSession in Python?

在beautifulsoup中,我们使用response.content渲染URL的文本并创建新文件。如果我们使用来自 requests_html 的 HTMLSession 而不是 beautifulsoup,我们应该写什么?

例如,

import requests
from urllib.parse import urlparse
from requests_html import HTMLSession

session = HTMLSession()

# Specify the DOI here
URL="https://academic.oup.com/qje/article/126/4/1593/17089543/qjr041.pdf" 
r = session.get(URL,allow_redirects=True)
with open(pdf_title, "wb") as new_pdf:
    print(f"Begin writing to {pdf_title}")
    new_pdf.write(r.html.content) # This line is not working

这就是您所需要的,尽管当我这样做时,我收到“管理规则禁止的请求”。想必,你有解决这个问题的钥匙。

import requests

pdf_title = "xyz.pdf"
URL="https://academic.oup.com/qje/article/126/4/1593/17089543/qjr041.pdf" 
r = requests.get(URL,allow_redirects=True)
with open(pdf_title, "wb") as new_pdf:
    new_pdf.write(r.content)