Python3 aiohttp - header 中的无效字符

Python3 aiohttp - invalid character in header

我在某些网站上使用 aiohttp 时收到错误“header 中的无效字符”,即使使用他们的示例代码也是如此。有些网站有效,有些则无效。他们使用请求包虽然工作正常。有什么想法吗?

#Example code
async def main():

    async with aiohttp.ClientSession() as session:
        async with session.get('https://www.rockhamptonregion.qld.gov.au/Home') as response:

            print("Status:", response.status)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

回溯示例:

Traceback (most recent call last):
  File "C:/Python Projects/test2.py", line 35, in <module>
    loop.run_until_complete(main())
  File "C:\Users\P\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "C:/Python Projects/test2.py", line 26, in main
    async with session.get('https://www.rockhamptonregion.qld.gov.au/Home') as response:
  File "C:\Users\P\AppData\Local\Programs\Python\Python38\lib\site-packages\aiohttp\client.py", line 1117, in __aenter__
    self._resp = await self._coro
  File "C:\Users\P\AppData\Local\Programs\Python\Python38\lib\site-packages\aiohttp\client.py", line 544, in _request
    await resp.start(conn)
  File "C:\Users\P\AppData\Local\Programs\Python\Python38\lib\site-packages\aiohttp\client_reqrep.py", line 892, in start
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 400, message='invalid character in header', url=URL('https://www.rockhamptonregion.qld.gov.au/Home')

至少从 curl 我看到了这个。

$ curl -s --head https://www.rockhamptonregion.qld.gov.au/Home \
    | grep -A 1 ___utmv | xxd
00000000: 5365 742d 436f 6f6b 6965 3a20 5f5f 5f75  Set-Cookie: ___u
00000010: 746d 766d 4c49 4275 7342 4545 5a3d 5a45  tmvmLIBusBEEZ=ZE
00000020: 785a 6470 426c 7776 703b 2070 6174 683d  xZdpBlwvp; path=
00000030: 2f3b 204d 6178 2d41 6765 3d39 3030 0d0a  /; Max-Age=900..
00000040: 5365 742d 436f 6f6b 6965 3a20 5f5f 5f75  Set-Cookie: ___u
00000050: 746d 7661 4c49 4275 7342 4545 5a3d 6d6e  tmvaLIBusBEEZ=mn
00000060: 4e01 6843 6343 3b20 7061 7468 3d2f 3b20  N.hCcC; path=/; 
00000070: 4d61 782d 4167 653d 3930 300d 0a53 6574  Max-Age=900..Set
00000080: 2d43 6f6f 6b69 653a 205f 5f5f 7574 6d76  -Cookie: ___utmv
00000090: 624c 4942 7573 4245 455a 3d4f 5a54 0d0a  bLIBusBEEZ=OZT..
000000a0: 2020 2020 5865 4f4f 6461 6c5a 3a20 7a74      XeOOdalZ: zt
000000b0: 673b 2070 6174 683d 2f3b 204d 6178 2d41  g; path=/; Max-A
000000c0: 6765 3d39 3030 0d0a                      ge=900..

这组 3 个 cookie 的名称以“___utmv”开头。这是应该的值。

>>> l = [
...     '5a45785a6470426c777670',
...     '6d6e4e0168436343',
...     '5a540d0a2020202058654f4f64616c5a3a207a7467',
... ]
>>> list(map(bytes.fromhex, l))
[b'ZExZdpBlwvp', b'mnN\x01hCcC', b'ZT\r\n    XeOOdalZ: ztg']

第一个没问题,最后一个似乎格式错误,但可能会被解释为另一个 cookie,但中间的显然违反了 HTTP RFC 2616,它在 4.2 Message Headers 中将消息 header 定义为:

  message-header = field-name ":" [ field-value ]
  field-name     = token
  field-value    = *( field-content | LWS )
  field-content  = <the OCTETs making up the field-value
                   and consisting of either *TEXT or combinations
                   of token, separators, and quoted-string>

b'\x01' 匹配 TEXTtokenseparatorsquoted-string.

中的 none

这可能是一个错误,或者他们不希望您解析它们。如果你仍然想这样做,你可能会寻找一个更宽松的 HTTP 客户端。例如,stdlib urllib 似乎没问题。

>>> from urllib.request import urlopen
... 
... resp = urlopen('https://www.rockhamptonregion.qld.gov.au/Home')
... [(k, v) for (k, v) in resp.getheaders() if v.startswith('___utmv')]
[('Set-Cookie', '___utmvmLIBusBEEZ=INQnabCZqUC; path=/; Max-Age=900'),
 ('Set-Cookie', '___utmvaLIBusBEEZ=ekS\x01bOgT; path=/; Max-Age=900'),
 ('Set-Cookie',
  '___utmvbLIBusBEEZ=aZI\r\n    XdBOPalz: vtB; path=/; Max-Age=900')]