如何获取 url 的内容并使用 Python 中的 HTMLSession 写入新文件？

Question

在beautifulsoup中，我们使用response.content渲染URL的文本并创建新文件。如果我们使用来自 requests_html 的 HTMLSession 而不是 beautifulsoup，我们应该写什么？

例如，

import requests
from urllib.parse import urlparse
from requests_html import HTMLSession

session = HTMLSession()

# Specify the DOI here
URL="https://academic.oup.com/qje/article/126/4/1593/17089543/qjr041.pdf" 
r = session.get(URL,allow_redirects=True)
with open(pdf_title, "wb") as new_pdf:
    print(f"Begin writing to {pdf_title}")
    new_pdf.write(r.html.content) # This line is not working

Answer 1

这就是您所需要的，尽管当我这样做时，我收到“管理规则禁止的请求”。想必，你有解决这个问题的钥匙。

import requests

pdf_title = "xyz.pdf"
URL="https://academic.oup.com/qje/article/126/4/1593/17089543/qjr041.pdf" 
r = requests.get(URL,allow_redirects=True)
with open(pdf_title, "wb") as new_pdf:
    new_pdf.write(r.content)

如何获取 url 的内容并使用 Python 中的 HTMLSession 写入新文件？

How can I get the content of an url and write into new file using HTMLSession in Python?

python

file

web-scraping

htmlsession