如何在 python 请求中重定向后获得最终目的地 URL？

Question

需要实际目的地 URL 的响应。

我已经尝试过提到的解决方案。

import requests
doi_link  = 'https://doi.org/10.1016/j.artint.2018.07.007'
response = requests.get(url= doi_link ,allow_redirects=True )
print(response.status_code,response.url, response.history)
#Outputs: 200 https://linkinghub.elsevier.com/retrieve/pii/S0004370218305988 [<Response [302]>]

为什么 allow_redirects 停在中间？

我在浏览器上手动完成时的最终 URL 是 https://www.sciencedirect.com/science/article/pii/S0004370218305988?via%3Dihub

我想以编程方式获得此 URL。

编辑正如评论中所建议的那样，最终调用目的地是使用 JS 进行的。

Answer 1

这里建议：Python Requests library redirect new url

您可以使用响应历史来获得最终的 URL。在这种情况下，最后的 URL 将 return 为 200，但是，它将在 HTML 中具有“final final”重定向。您可以解析最终的 HTML 以获得重定向 URL.

我会使用 beautifulsoup4 之类的东西来使解析变得非常容易 - pip install beautifulsoup4

import requests
from bs4 import BeautifulSoup
from urllib.request import unquote
from html import unescape

doi_link  = 'https://doi.org/10.1016/j.artint.2018.07.007'
response = requests.get(url= doi_link ,allow_redirects=True )
for resp in response.history:
     print(resp.status_code, resp.url)

# use final response
# parse html and get final redirect url
soup = BeautifulSoup(response.text, 'html.parser')
redirect_url = soup.find(name="input" ,attrs={"name":"redirectURL"})["value"]

# get final response. unescape and unquote url from the HTML
final_url = unescape(unquote(redirect_url))
print(final_url)
article_resp = requests.get(final_url)

如何在 python 请求中重定向后获得最终目的地 URL？

How to get the final destination URL after redirections in python requests?

https

python-3.x

python-requests