如何拆分 header 个值？

Question

我正在解析 HTTP headers。我想将 header 值拆分为有意义的数组。

例如Cache-Control: no-cache, no-store应该return['no-cache','no-store'].

HTTP RFC2616 说：

Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma. The order in which header fields with the same field-name are received is therefore significant to the interpretation of the combined field value, and thus a proxy MUST NOT change the order of these field values when a message is forwarded

但我不确定反过来是否正确——用逗号 split 安全吗？

我已经找到了一个导致问题的示例。例如，我的 User-Agent 字符串是

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36

即在"KHTML"之后包含一个逗号。显然我没有超过一个用户代理，所以拆分这个 header.

没有意义

User-Agent 字符串是唯一的例外，还是还有更多？

Answer 1

if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]

所以情况正好相反。当规范说 Field 支持 #(value) 时，您只能假设 Field: value1, value2 等同于 Field: value1 + Field: value2，即逗号分隔的值列表。

Answer 2

通读规范后，我得出以下结论 headers 支持多个 (comma-separated) 值：

接受
Accept-Charset
Accept-Encoding
Accept-Language
Accept-Patch
Accept-Ranges
允许
Cache-Control
连接
Content-Encoding
Content-Language
期待
If-Match
If-None-Match
编译指示
Proxy-Authenticate
TE
预告片
Transfer-Encoding
升级
变化
通过
警告
WWW-Authenticate
X-Forwarded-For

您可以使用它来创建可拆分的白名单 headers。

Answer 3

不，根据逗号分割 header 是不安全的。例如，Accept: foo/bar;p="A,B,C", bob/dole;x="apples,oranges" 是一个有效的 header，但如果您试图在逗号上拆分以获取 mime-types 的列表，您将得到无效的结果。

正确答案是每个 header 都是使用 ABNF 指定的，其中大部分在各种 RFC 中，例如Accept: 是 defined in RFC7231 Section 5.3.2.

我遇到了这个具体问题 wrote a parser and tested it on edge cases. Not only is parsing the header non-trivial, interpreting it and giving the correct result is also non-trivial。

一些 header 比其他的更复杂，但本质上每个 header 都有自己的语法，应该尊重正确（和安全）处理。

如何拆分 header 个值？

How to split header values?

http

rfc2616

http-headers