写入 NamedTemporaryFile 静默失败;将 Curl cookie jar 转换为 Requests cookie
Writing to NamedTemporaryFile fails silently; converting Curl cookie jar to Requests cookies
我正在尝试获取 Curl 吐出的 Netscape HTTP Cookie 文件并将其转换为 Requests 库可以使用的 Cookiejar。我的 Python 脚本中有 netscapeCookieString
作为变量,它看起来像:
# Netscape HTTP Cookie File
# https://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.
.miami.edu TRUE / TRUE 0 PS_LASTSITE https://canelink.miami.edu/psc/PUMI2J/
因为不想自己解析cookie文件,所以想用cookielib
。遗憾的是,这意味着我必须写入磁盘,因为 cookielib.MozillaCookieJar()
不会将字符串作为输入:它必须采用文件。
所以我正在使用 NamedTemporaryFile
(无法使 SpooledTemporaryFile
工作;如果可能的话,再次希望在内存中完成所有这些操作)。
tempCookieFile = tempfile.NamedTemporaryFile()
# now take the contents of the cookie string and put it into this in memory file
# that cookielib will read from. There are a couple quirks though.
for line in netscapeCookieString.splitlines():
# cookielib doesn't know how to handle httpOnly cookies correctly
# so we have to do some pre-processing to make sure they make it into
# the cookielib. Basically just removing the httpOnly prefix which is honestly
# an abuse of the RFC in the first place. note: httpOnly actually refers to
# cookies that javascript can't access, as in only http protocol can
# access them, it has nothing to do with http vs https. it's purely
# to protect against XSS a bit better. These cookies may actually end up
# being the most critical of all cookies in a given set.
#
if line.startswith("#HttpOnly_"):
# this is actually how the curl library removes the httpOnly, by doing length
line = line[len("#HttpOnly_"):]
tempCookieFile.write(line)
tempCookieFile.flush()
# another thing that cookielib doesn't handle very well is
# session cookies, which have 0 in the expires param
# so we have to make sure they don't get expired when they're
# read in by cookielib
#
print tempCookieFile.read()
cookieJar = cookielib.MozillaCookieJar(tempCookieFile.name)
cookieJar.load(ignore_expires=True)
pprint.pprint(cookieJar)
但问题是,这是行不通的!
print tempCookieFile.read()
打印一个空行。
因此,pprint.pprint(cookieJar)
打印一个空的饼干罐。
我很容易就能在我的 Mac 上重现这个:
>>> import tempfile
>>> tempCookieFile = tempfile.NamedTemporaryFile()
>>> tempCookieFile.write("hey")
>>> tempCookieFile.flush()
>>> print tempCookieFile.read()
>>>
我怎样才能真正写入 NamedTemporaryFile
?
写入文件后,指向该文件的指针指向写入数据之后的位置(在您的情况下是文件末尾),因此当您读取它时 returns 一个空字符串(没有更多数据文件结束后)在阅读前寻找 0
>>> import tempfile
>>> tempCookieFile = tempfile.NamedTemporaryFile()
>>> tempCookieFile.write("hey")
>>> tempCookieFile.seek(0)
>>> print(tempCookieFile.read())
我正在尝试获取 Curl 吐出的 Netscape HTTP Cookie 文件并将其转换为 Requests 库可以使用的 Cookiejar。我的 Python 脚本中有 netscapeCookieString
作为变量,它看起来像:
# Netscape HTTP Cookie File
# https://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.
.miami.edu TRUE / TRUE 0 PS_LASTSITE https://canelink.miami.edu/psc/PUMI2J/
因为不想自己解析cookie文件,所以想用cookielib
。遗憾的是,这意味着我必须写入磁盘,因为 cookielib.MozillaCookieJar()
不会将字符串作为输入:它必须采用文件。
所以我正在使用 NamedTemporaryFile
(无法使 SpooledTemporaryFile
工作;如果可能的话,再次希望在内存中完成所有这些操作)。
tempCookieFile = tempfile.NamedTemporaryFile()
# now take the contents of the cookie string and put it into this in memory file
# that cookielib will read from. There are a couple quirks though.
for line in netscapeCookieString.splitlines():
# cookielib doesn't know how to handle httpOnly cookies correctly
# so we have to do some pre-processing to make sure they make it into
# the cookielib. Basically just removing the httpOnly prefix which is honestly
# an abuse of the RFC in the first place. note: httpOnly actually refers to
# cookies that javascript can't access, as in only http protocol can
# access them, it has nothing to do with http vs https. it's purely
# to protect against XSS a bit better. These cookies may actually end up
# being the most critical of all cookies in a given set.
#
if line.startswith("#HttpOnly_"):
# this is actually how the curl library removes the httpOnly, by doing length
line = line[len("#HttpOnly_"):]
tempCookieFile.write(line)
tempCookieFile.flush()
# another thing that cookielib doesn't handle very well is
# session cookies, which have 0 in the expires param
# so we have to make sure they don't get expired when they're
# read in by cookielib
#
print tempCookieFile.read()
cookieJar = cookielib.MozillaCookieJar(tempCookieFile.name)
cookieJar.load(ignore_expires=True)
pprint.pprint(cookieJar)
但问题是,这是行不通的!
print tempCookieFile.read()
打印一个空行。
因此,pprint.pprint(cookieJar)
打印一个空的饼干罐。
我很容易就能在我的 Mac 上重现这个:
>>> import tempfile
>>> tempCookieFile = tempfile.NamedTemporaryFile()
>>> tempCookieFile.write("hey")
>>> tempCookieFile.flush()
>>> print tempCookieFile.read()
>>>
我怎样才能真正写入 NamedTemporaryFile
?
写入文件后,指向该文件的指针指向写入数据之后的位置(在您的情况下是文件末尾),因此当您读取它时 returns 一个空字符串(没有更多数据文件结束后)在阅读前寻找 0
>>> import tempfile
>>> tempCookieFile = tempfile.NamedTemporaryFile()
>>> tempCookieFile.write("hey")
>>> tempCookieFile.seek(0)
>>> print(tempCookieFile.read())