将 cookie 数据添加到 requests.urlretrieve
Adding cookie data into a requests.urlretrieve
我正在尝试从受密码保护的站点下载 .torrent 文件。
我已经设法使用 cookie 访问该站点,如下所示:
cookies = {'uid': '232323', 'pass': '31321231jh12j3hj213hj213hk',
'__cfduid': 'kj123kj21kj31k23jkl21j321j3kl213kl21j3'}
try:
# read site content
read = requests.get(s_string, cookies=cookies).content
except RequestException as e:
raise print('Could not connect to somesite: %s' % e)
soup = BeautifulSoup(read, 'html.parser')
通过上面的代码,我可以访问该站点并抓取我需要的数据。使用抓取的数据,我构建了一个 link 到 .torrent 文件,然后我想下载它,但这就是我卡住的地方。
这是我现在正在尝试的:(cookie 数据显然不是真实的,就像它也不在上面的代码中一样)
cookies = {'uid': '232323', 'pass': '31321231jh12j3hj213hj213hk',
'__cfduid': 'kj123kj21kj31k23jkl21j321j3kl213kl21j3'}
# construct download URL
torrent_url = ('https://www.somesite.com/' + torrent_url)
# for testing purposes DELETE!
print('torrent link:', torrent_url)
# download torrent file into a folder
filename = torrent_url.split('/')[-1]
save_as = 'torrents/' + filename + '.torrent'
try:
r = request.urlretrieve(torrent_url, save_as, data=cookies)
print("Download successful for: " + filename)
except request.URLError as e:
raise print("Error :%s" % e)
此代码在正常站点上没有 cookie 的情况下也能正常工作,但我试图获取的这个 .torrent 文件位于 passworded/captchaed 站点后面,因此我需要使用 cookie 来抓取它。
所以问题是,我在这里做错了什么?没有 data=cookies
我得到 http 404 error
而有 data=cookies
我得到以下错误:
File "/usr/lib/python3.6/http/client.py", line 1064, in _send_output
+ b'\r\n'
TypeError: can't concat str to bytes </error>
ps。在任何人询问之前,是的,我 100% 确定 torrent_url 是正确的,我将其打印并手动 copy/pasting 到我自己的浏览器中 ps 下载 window .有问题的 Torrent 文件
编辑:
try:
read = requests.session().get(torrent_url)
with open(save_as, 'wb') as w:
for chunk in read.iter_content(chunk_size=1024):
if chunk:
w.write(chunk)
w.close()
print("Download successful for: " + filename)
except request.URLError as e:
print("Error :%s" % e)
根据 furas 的建议制作的,现在可以使用,但是当我尝试打开 .torrent 时,torrent 客户端说 "invalid coding, cannot open"。
当我打开 .torrent 文件时,里面是这样的:
<h1>Not Found</h1>
<p>Sorry pal :(</p>
<script src="/cdn-cgi/apps/head/o1wasdM-xsd3-9gm7FQY.js"></script>
我是否仍然做错了什么,或者这与网站所有者阻止程序从他的网站下载 .torrents 或类似性质的东西有关?
这可行,但我认为并不理想。
cookies = {'uid': '232323', 'pass': '31321231jh12j3hj213hj213hk',
'__cfduid': 'kj123kj21kj31k23jkl21j321j3kl213kl21j3'}
try:
read = requests.get(torrent_url, cookies=cookies)
with open(save_as, 'wb') as w:
for chunk in read.iter_content(chunk_size=512):
if chunk:
w.write(chunk)
print(filename + ' downloaded successfully!!!')
except request.URLError as e:
print("Error :%s" % e)
我正在尝试从受密码保护的站点下载 .torrent 文件。 我已经设法使用 cookie 访问该站点,如下所示:
cookies = {'uid': '232323', 'pass': '31321231jh12j3hj213hj213hk',
'__cfduid': 'kj123kj21kj31k23jkl21j321j3kl213kl21j3'}
try:
# read site content
read = requests.get(s_string, cookies=cookies).content
except RequestException as e:
raise print('Could not connect to somesite: %s' % e)
soup = BeautifulSoup(read, 'html.parser')
通过上面的代码,我可以访问该站点并抓取我需要的数据。使用抓取的数据,我构建了一个 link 到 .torrent 文件,然后我想下载它,但这就是我卡住的地方。
这是我现在正在尝试的:(cookie 数据显然不是真实的,就像它也不在上面的代码中一样)
cookies = {'uid': '232323', 'pass': '31321231jh12j3hj213hj213hk',
'__cfduid': 'kj123kj21kj31k23jkl21j321j3kl213kl21j3'}
# construct download URL
torrent_url = ('https://www.somesite.com/' + torrent_url)
# for testing purposes DELETE!
print('torrent link:', torrent_url)
# download torrent file into a folder
filename = torrent_url.split('/')[-1]
save_as = 'torrents/' + filename + '.torrent'
try:
r = request.urlretrieve(torrent_url, save_as, data=cookies)
print("Download successful for: " + filename)
except request.URLError as e:
raise print("Error :%s" % e)
此代码在正常站点上没有 cookie 的情况下也能正常工作,但我试图获取的这个 .torrent 文件位于 passworded/captchaed 站点后面,因此我需要使用 cookie 来抓取它。
所以问题是,我在这里做错了什么?没有 data=cookies
我得到 http 404 error
而有 data=cookies
我得到以下错误:
File "/usr/lib/python3.6/http/client.py", line 1064, in _send_output
+ b'\r\n'
TypeError: can't concat str to bytes </error>
ps。在任何人询问之前,是的,我 100% 确定 torrent_url 是正确的,我将其打印并手动 copy/pasting 到我自己的浏览器中 ps 下载 window .有问题的 Torrent 文件
编辑:
try:
read = requests.session().get(torrent_url)
with open(save_as, 'wb') as w:
for chunk in read.iter_content(chunk_size=1024):
if chunk:
w.write(chunk)
w.close()
print("Download successful for: " + filename)
except request.URLError as e:
print("Error :%s" % e)
根据 furas 的建议制作的,现在可以使用,但是当我尝试打开 .torrent 时,torrent 客户端说 "invalid coding, cannot open"。
当我打开 .torrent 文件时,里面是这样的:
<h1>Not Found</h1>
<p>Sorry pal :(</p>
<script src="/cdn-cgi/apps/head/o1wasdM-xsd3-9gm7FQY.js"></script>
我是否仍然做错了什么,或者这与网站所有者阻止程序从他的网站下载 .torrents 或类似性质的东西有关?
这可行,但我认为并不理想。
cookies = {'uid': '232323', 'pass': '31321231jh12j3hj213hj213hk',
'__cfduid': 'kj123kj21kj31k23jkl21j321j3kl213kl21j3'}
try:
read = requests.get(torrent_url, cookies=cookies)
with open(save_as, 'wb') as w:
for chunk in read.iter_content(chunk_size=512):
if chunk:
w.write(chunk)
print(filename + ' downloaded successfully!!!')
except request.URLError as e:
print("Error :%s" % e)