如何拆分 header 个值?

How to split header values?

我正在解析 HTTP headers。我想将 header 值拆分为有意义的数组。

例如Cache-Control: no-cache, no-store应该return['no-cache','no-store'].

HTTP RFC2616 说:

Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma. The order in which header fields with the same field-name are received is therefore significant to the interpretation of the combined field value, and thus a proxy MUST NOT change the order of these field values when a message is forwarded

但我不确定反过来是否正确——用逗号 split 安全吗?

我已经找到了一个导致问题的示例。例如,我的 User-Agent 字符串是

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36

即在"KHTML"之后包含一个逗号。显然我没有超过一个用户代理,所以拆分这个 header.

没有意义

User-Agent 字符串是唯一的例外,还是还有更多?

if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]

所以情况正好相反。当规范说 Field 支持 #(value) 时,您只能假设 Field: value1, value2 等同于 Field: value1 + Field: value2,即逗号分隔的值列表。

通读规范后,我得出以下结论 headers 支持多个 (comma-separated) 值:

  • 接受
  • Accept-Charset
  • Accept-Encoding
  • Accept-Language
  • Accept-Patch
  • Accept-Ranges
  • 允许
  • Cache-Control
  • 连接
  • Content-Encoding
  • Content-Language
  • 期待
  • If-Match
  • If-None-Match
  • 编译指示
  • Proxy-Authenticate
  • TE
  • 预告片
  • Transfer-Encoding
  • 升级
  • 变化
  • 通过
  • 警告
  • WWW-Authenticate
  • X-Forwarded-For

您可以使用它来创建可拆分的白名单 headers。

不,根据逗号分割 header 是不安全的。例如,Accept: foo/bar;p="A,B,C", bob/dole;x="apples,oranges" 是一个有效的 header,但如果您试图在逗号上拆分以获取 mime-types 的列表,您将得到无效的结果。

正确答案是每个 header 都是使用 ABNF 指定的,其中大部分在各种 RFC 中,例如Accept:defined in RFC7231 Section 5.3.2.

我遇到了这个具体问题 wrote a parser and tested it on edge cases. Not only is parsing the header non-trivial, interpreting it and giving the correct result is also non-trivial

一些 header 比其他的更复杂,但本质上每个 header 都有自己的语法,应该尊重正确(和安全)处理。