Beautiful Soup - urllib.error.HTTPError: HTTP Error 403: Forbidden

Question

我正在尝试使用 urrlib 下载 GIF 文件，但出现此错误：

urllib.error.HTTPError: HTTP Error 403: Forbidden

当我从其他博客站点下载时，不会发生这种情况。这是我的代码：

import requests
import urllib.request

url_1 = 'https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'

source_code = requests.get(url_1,headers = {'User-Agent': 'Mozilla/5.0'})    

path = 'C:/Users/roysu/Desktop/src_code/Python_projects/python/web_scrap/myPath/'

full_name = path + ".gif"    
urllib.request.urlretrieve(url_1,full_name)

Answer 1

不要使用 urllib.request.urlretrieve。相反，像这样使用 requests 库：

import requests

url = 'https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'

path = "D:\Test.gif"

response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})

file = open(path, "wb")

file.write(response.content)

file.close()

输出：

希望对您有所帮助！

Answer 2

解法：
远程服务器显然正在检查用户代理 header 并拒绝来自 Python 的 urllib 的请求。
urllib.request.urlretrieve() 不允许您更改 HTTP header，但是，您可以使用
urllib.request.URLopener.retrieve():

import urllib.request

url_1='https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'

path='/home/piyushsambhi/Downloads/'

full_name= path + "testimg.gif"

opener = urllib.request.URLopener()
opener.addheader('User-Agent', 'Mozilla/5.0')
filename, headers = opener.retrieve(url_1, full_name)

print(filename)

注意： 您正在使用 Python 3，这些函数现在被认为是“Legacy interface”的一部分，并且 URLopener 具有已弃用。因此，您不应在新代码中使用它们。

您的代码导入 requests，但您没有使用它 - 不过您应该使用它，因为它比 urllib 容易得多。下面提到的代码片段对我有用：

import requests

url = 'https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'
path='/home/piyushsambhi/Downloads/'
full_name= path + "testimg1.gif"

r = requests.get(url)
with open(full_name, 'wb') as outfile:
    outfile.write(r.content)

注意：根据您的机器和环境更改路径变量

Beautiful Soup - urllib.error.HTTPError: HTTP Error 403: Forbidden

Beautiful Soup - urllib.error.HTTPError: HTTP Error 403: Forbidden

python

urllib

beautifulsoup