如何拆分 header 个值?
How to split header values?
我正在解析 HTTP headers。我想将 header 值拆分为有意义的数组。
例如Cache-Control: no-cache, no-store
应该return['no-cache','no-store']
.
HTTP RFC2616 说:
Multiple message-header fields with the same field-name MAY be present
in a message if and only if the entire field-value for that header
field is defined as a comma-separated list [i.e., #(values)]. It MUST
be possible to combine the multiple header fields into one
"field-name: field-value" pair, without changing the semantics of the
message, by appending each subsequent field-value to the first, each
separated by a comma. The order in which header fields with the same
field-name are received is therefore significant to the interpretation
of the combined field value, and thus a proxy MUST NOT change the
order of these field values when a message is forwarded
但我不确定反过来是否正确——用逗号 split 安全吗?
我已经找到了一个导致问题的示例。例如,我的 User-Agent 字符串是
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36
即在"KHTML"之后包含一个逗号。显然我没有超过一个用户代理,所以拆分这个 header.
没有意义
User-Agent 字符串是唯一的例外,还是还有更多?
if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]
所以情况正好相反。当规范说 Field
支持 #(value)
时,您只能假设 Field: value1, value2
等同于 Field: value1
+ Field: value2
,即逗号分隔的值列表。
通读规范后,我得出以下结论 headers 支持多个 (comma-separated) 值:
- 接受
- Accept-Charset
- Accept-Encoding
- Accept-Language
- Accept-Patch
- Accept-Ranges
- 允许
- Cache-Control
- 连接
- Content-Encoding
- Content-Language
- 期待
- If-Match
- If-None-Match
- 编译指示
- Proxy-Authenticate
- TE
- 预告片
- Transfer-Encoding
- 升级
- 变化
- 通过
- 警告
- WWW-Authenticate
- X-Forwarded-For
您可以使用它来创建可拆分的白名单 headers。
不,根据逗号分割 header 是不安全的。例如,Accept: foo/bar;p="A,B,C", bob/dole;x="apples,oranges"
是一个有效的 header,但如果您试图在逗号上拆分以获取 mime-types 的列表,您将得到无效的结果。
正确答案是每个 header 都是使用 ABNF 指定的,其中大部分在各种 RFC 中,例如Accept:
是 defined in RFC7231 Section 5.3.2.
我遇到了这个具体问题 wrote a parser and tested it on edge cases. Not only is parsing the header non-trivial, interpreting it and giving the correct result is also non-trivial。
一些 header 比其他的更复杂,但本质上每个 header 都有自己的语法,应该尊重正确(和安全)处理。
我正在解析 HTTP headers。我想将 header 值拆分为有意义的数组。
例如Cache-Control: no-cache, no-store
应该return['no-cache','no-store']
.
HTTP RFC2616 说:
Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma. The order in which header fields with the same field-name are received is therefore significant to the interpretation of the combined field value, and thus a proxy MUST NOT change the order of these field values when a message is forwarded
但我不确定反过来是否正确——用逗号 split 安全吗?
我已经找到了一个导致问题的示例。例如,我的 User-Agent 字符串是
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36
即在"KHTML"之后包含一个逗号。显然我没有超过一个用户代理,所以拆分这个 header.
没有意义User-Agent 字符串是唯一的例外,还是还有更多?
if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]
所以情况正好相反。当规范说 Field
支持 #(value)
时,您只能假设 Field: value1, value2
等同于 Field: value1
+ Field: value2
,即逗号分隔的值列表。
通读规范后,我得出以下结论 headers 支持多个 (comma-separated) 值:
- 接受
- Accept-Charset
- Accept-Encoding
- Accept-Language
- Accept-Patch
- Accept-Ranges
- 允许
- Cache-Control
- 连接
- Content-Encoding
- Content-Language
- 期待
- If-Match
- If-None-Match
- 编译指示
- Proxy-Authenticate
- TE
- 预告片
- Transfer-Encoding
- 升级
- 变化
- 通过
- 警告
- WWW-Authenticate
- X-Forwarded-For
您可以使用它来创建可拆分的白名单 headers。
不,根据逗号分割 header 是不安全的。例如,Accept: foo/bar;p="A,B,C", bob/dole;x="apples,oranges"
是一个有效的 header,但如果您试图在逗号上拆分以获取 mime-types 的列表,您将得到无效的结果。
正确答案是每个 header 都是使用 ABNF 指定的,其中大部分在各种 RFC 中,例如Accept:
是 defined in RFC7231 Section 5.3.2.
我遇到了这个具体问题 wrote a parser and tested it on edge cases. Not only is parsing the header non-trivial, interpreting it and giving the correct result is also non-trivial。
一些 header 比其他的更复杂,但本质上每个 header 都有自己的语法,应该尊重正确(和安全)处理。