Python3 aiohttp - header 中的无效字符
Python3 aiohttp - invalid character in header
我在某些网站上使用 aiohttp 时收到错误“header 中的无效字符”,即使使用他们的示例代码也是如此。有些网站有效,有些则无效。他们使用请求包虽然工作正常。有什么想法吗?
#Example code
async def main():
async with aiohttp.ClientSession() as session:
async with session.get('https://www.rockhamptonregion.qld.gov.au/Home') as response:
print("Status:", response.status)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
回溯示例:
Traceback (most recent call last):
File "C:/Python Projects/test2.py", line 35, in <module>
loop.run_until_complete(main())
File "C:\Users\P\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "C:/Python Projects/test2.py", line 26, in main
async with session.get('https://www.rockhamptonregion.qld.gov.au/Home') as response:
File "C:\Users\P\AppData\Local\Programs\Python\Python38\lib\site-packages\aiohttp\client.py", line 1117, in __aenter__
self._resp = await self._coro
File "C:\Users\P\AppData\Local\Programs\Python\Python38\lib\site-packages\aiohttp\client.py", line 544, in _request
await resp.start(conn)
File "C:\Users\P\AppData\Local\Programs\Python\Python38\lib\site-packages\aiohttp\client_reqrep.py", line 892, in start
raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 400, message='invalid character in header', url=URL('https://www.rockhamptonregion.qld.gov.au/Home')
至少从 curl
我看到了这个。
$ curl -s --head https://www.rockhamptonregion.qld.gov.au/Home \
| grep -A 1 ___utmv | xxd
00000000: 5365 742d 436f 6f6b 6965 3a20 5f5f 5f75 Set-Cookie: ___u
00000010: 746d 766d 4c49 4275 7342 4545 5a3d 5a45 tmvmLIBusBEEZ=ZE
00000020: 785a 6470 426c 7776 703b 2070 6174 683d xZdpBlwvp; path=
00000030: 2f3b 204d 6178 2d41 6765 3d39 3030 0d0a /; Max-Age=900..
00000040: 5365 742d 436f 6f6b 6965 3a20 5f5f 5f75 Set-Cookie: ___u
00000050: 746d 7661 4c49 4275 7342 4545 5a3d 6d6e tmvaLIBusBEEZ=mn
00000060: 4e01 6843 6343 3b20 7061 7468 3d2f 3b20 N.hCcC; path=/;
00000070: 4d61 782d 4167 653d 3930 300d 0a53 6574 Max-Age=900..Set
00000080: 2d43 6f6f 6b69 653a 205f 5f5f 7574 6d76 -Cookie: ___utmv
00000090: 624c 4942 7573 4245 455a 3d4f 5a54 0d0a bLIBusBEEZ=OZT..
000000a0: 2020 2020 5865 4f4f 6461 6c5a 3a20 7a74 XeOOdalZ: zt
000000b0: 673b 2070 6174 683d 2f3b 204d 6178 2d41 g; path=/; Max-A
000000c0: 6765 3d39 3030 0d0a ge=900..
这组 3 个 cookie 的名称以“___utmv”开头。这是应该的值。
>>> l = [
... '5a45785a6470426c777670',
... '6d6e4e0168436343',
... '5a540d0a2020202058654f4f64616c5a3a207a7467',
... ]
>>> list(map(bytes.fromhex, l))
[b'ZExZdpBlwvp', b'mnN\x01hCcC', b'ZT\r\n XeOOdalZ: ztg']
第一个没问题,最后一个似乎格式错误,但可能会被解释为另一个 cookie,但中间的显然违反了 HTTP RFC 2616,它在 4.2 Message Headers 中将消息 header 定义为:
message-header = field-name ":" [ field-value ]
field-name = token
field-value = *( field-content | LWS )
field-content = <the OCTETs making up the field-value
and consisting of either *TEXT or combinations
of token, separators, and quoted-string>
b'\x01'
匹配 TEXT
、token
、separators
或 quoted-string
.
中的 none
这可能是一个错误,或者他们不希望您解析它们。如果你仍然想这样做,你可能会寻找一个更宽松的 HTTP 客户端。例如,stdlib urllib
似乎没问题。
>>> from urllib.request import urlopen
...
... resp = urlopen('https://www.rockhamptonregion.qld.gov.au/Home')
... [(k, v) for (k, v) in resp.getheaders() if v.startswith('___utmv')]
[('Set-Cookie', '___utmvmLIBusBEEZ=INQnabCZqUC; path=/; Max-Age=900'),
('Set-Cookie', '___utmvaLIBusBEEZ=ekS\x01bOgT; path=/; Max-Age=900'),
('Set-Cookie',
'___utmvbLIBusBEEZ=aZI\r\n XdBOPalz: vtB; path=/; Max-Age=900')]
我在某些网站上使用 aiohttp 时收到错误“header 中的无效字符”,即使使用他们的示例代码也是如此。有些网站有效,有些则无效。他们使用请求包虽然工作正常。有什么想法吗?
#Example code
async def main():
async with aiohttp.ClientSession() as session:
async with session.get('https://www.rockhamptonregion.qld.gov.au/Home') as response:
print("Status:", response.status)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
回溯示例:
Traceback (most recent call last):
File "C:/Python Projects/test2.py", line 35, in <module>
loop.run_until_complete(main())
File "C:\Users\P\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "C:/Python Projects/test2.py", line 26, in main
async with session.get('https://www.rockhamptonregion.qld.gov.au/Home') as response:
File "C:\Users\P\AppData\Local\Programs\Python\Python38\lib\site-packages\aiohttp\client.py", line 1117, in __aenter__
self._resp = await self._coro
File "C:\Users\P\AppData\Local\Programs\Python\Python38\lib\site-packages\aiohttp\client.py", line 544, in _request
await resp.start(conn)
File "C:\Users\P\AppData\Local\Programs\Python\Python38\lib\site-packages\aiohttp\client_reqrep.py", line 892, in start
raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 400, message='invalid character in header', url=URL('https://www.rockhamptonregion.qld.gov.au/Home')
至少从 curl
我看到了这个。
$ curl -s --head https://www.rockhamptonregion.qld.gov.au/Home \
| grep -A 1 ___utmv | xxd
00000000: 5365 742d 436f 6f6b 6965 3a20 5f5f 5f75 Set-Cookie: ___u
00000010: 746d 766d 4c49 4275 7342 4545 5a3d 5a45 tmvmLIBusBEEZ=ZE
00000020: 785a 6470 426c 7776 703b 2070 6174 683d xZdpBlwvp; path=
00000030: 2f3b 204d 6178 2d41 6765 3d39 3030 0d0a /; Max-Age=900..
00000040: 5365 742d 436f 6f6b 6965 3a20 5f5f 5f75 Set-Cookie: ___u
00000050: 746d 7661 4c49 4275 7342 4545 5a3d 6d6e tmvaLIBusBEEZ=mn
00000060: 4e01 6843 6343 3b20 7061 7468 3d2f 3b20 N.hCcC; path=/;
00000070: 4d61 782d 4167 653d 3930 300d 0a53 6574 Max-Age=900..Set
00000080: 2d43 6f6f 6b69 653a 205f 5f5f 7574 6d76 -Cookie: ___utmv
00000090: 624c 4942 7573 4245 455a 3d4f 5a54 0d0a bLIBusBEEZ=OZT..
000000a0: 2020 2020 5865 4f4f 6461 6c5a 3a20 7a74 XeOOdalZ: zt
000000b0: 673b 2070 6174 683d 2f3b 204d 6178 2d41 g; path=/; Max-A
000000c0: 6765 3d39 3030 0d0a ge=900..
这组 3 个 cookie 的名称以“___utmv”开头。这是应该的值。
>>> l = [
... '5a45785a6470426c777670',
... '6d6e4e0168436343',
... '5a540d0a2020202058654f4f64616c5a3a207a7467',
... ]
>>> list(map(bytes.fromhex, l))
[b'ZExZdpBlwvp', b'mnN\x01hCcC', b'ZT\r\n XeOOdalZ: ztg']
第一个没问题,最后一个似乎格式错误,但可能会被解释为另一个 cookie,但中间的显然违反了 HTTP RFC 2616,它在 4.2 Message Headers 中将消息 header 定义为:
message-header = field-name ":" [ field-value ] field-name = token field-value = *( field-content | LWS ) field-content = <the OCTETs making up the field-value and consisting of either *TEXT or combinations of token, separators, and quoted-string>
b'\x01'
匹配 TEXT
、token
、separators
或 quoted-string
.
这可能是一个错误,或者他们不希望您解析它们。如果你仍然想这样做,你可能会寻找一个更宽松的 HTTP 客户端。例如,stdlib urllib
似乎没问题。
>>> from urllib.request import urlopen
...
... resp = urlopen('https://www.rockhamptonregion.qld.gov.au/Home')
... [(k, v) for (k, v) in resp.getheaders() if v.startswith('___utmv')]
[('Set-Cookie', '___utmvmLIBusBEEZ=INQnabCZqUC; path=/; Max-Age=900'),
('Set-Cookie', '___utmvaLIBusBEEZ=ekS\x01bOgT; path=/; Max-Age=900'),
('Set-Cookie',
'___utmvbLIBusBEEZ=aZI\r\n XdBOPalz: vtB; path=/; Max-Age=900')]