如何获取 url 的内容并使用 Python 中的 HTMLSession 写入新文件?
How can I get the content of an url and write into new file using HTMLSession in Python?
在beautifulsoup中,我们使用response.content渲染URL的文本并创建新文件。如果我们使用来自 requests_html 的 HTMLSession 而不是 beautifulsoup,我们应该写什么?
例如,
import requests
from urllib.parse import urlparse
from requests_html import HTMLSession
session = HTMLSession()
# Specify the DOI here
URL="https://academic.oup.com/qje/article/126/4/1593/17089543/qjr041.pdf"
r = session.get(URL,allow_redirects=True)
with open(pdf_title, "wb") as new_pdf:
print(f"Begin writing to {pdf_title}")
new_pdf.write(r.html.content) # This line is not working
这就是您所需要的,尽管当我这样做时,我收到“管理规则禁止的请求”。想必,你有解决这个问题的钥匙。
import requests
pdf_title = "xyz.pdf"
URL="https://academic.oup.com/qje/article/126/4/1593/17089543/qjr041.pdf"
r = requests.get(URL,allow_redirects=True)
with open(pdf_title, "wb") as new_pdf:
new_pdf.write(r.content)
在beautifulsoup中,我们使用response.content渲染URL的文本并创建新文件。如果我们使用来自 requests_html 的 HTMLSession 而不是 beautifulsoup,我们应该写什么?
例如,
import requests
from urllib.parse import urlparse
from requests_html import HTMLSession
session = HTMLSession()
# Specify the DOI here
URL="https://academic.oup.com/qje/article/126/4/1593/17089543/qjr041.pdf"
r = session.get(URL,allow_redirects=True)
with open(pdf_title, "wb") as new_pdf:
print(f"Begin writing to {pdf_title}")
new_pdf.write(r.html.content) # This line is not working
这就是您所需要的,尽管当我这样做时,我收到“管理规则禁止的请求”。想必,你有解决这个问题的钥匙。
import requests
pdf_title = "xyz.pdf"
URL="https://academic.oup.com/qje/article/126/4/1593/17089543/qjr041.pdf"
r = requests.get(URL,allow_redirects=True)
with open(pdf_title, "wb") as new_pdf:
new_pdf.write(r.content)