解释具有多个封装 octet-strings 的 ASN.1 indefinite-lenght 编码

Interpreting ASN.1 indefinite-lenght encoding with multiple encapsulated octet-strings

我有这样的 BER 结构...

$ openssl asn1parse -inform der -in test.der -i -dump

 ????:d=4  hl=2 l=inf  cons:     cont [ 0 ]
 ????:d=5  hl=3 l= 240 prim:      OCTET STRING
      0000 - AABBCCDD
 ????:d=5  hl=2 l=   8 prim:      OCTET STRING
      0000 - EEFF
 ????:d=5  hl=2 l=   0 prim:      EOC

...或 der2ascii 风格...

[0] `80`
  OCTET_STRING { `AABBCCDD` }
  OCTET_STRING { `EEFF` }
`0000`

我所知道的:indefinite-length编码必须包含构造类型,因为原始类型可能会引入歧义,例如当包含 0x0000 时。我想知道的是:解码器在解析此 BER 结构时必须如何表现?编码中是否包含两个 OCTET STRING 的 header 字节?如果是,indefinite-length 字节数据是如何编码的?例如,当第二个 OCTET STRING 是一个整数?

我问这个问题,因为在CMS标准中,一个字段被定义为单个OCTET STRING,但在大多数BER编码中我总是看到两个。这仅仅是因为 indefinite-length 编码吗?我错过了什么吗?

来自 ITU-T X.690:

8.1.4 Contents octets

The contents octets shall consist of zero, one or more octets, and shall encode the data value as specified in subsequent clauses.

NOTE – The contents octets depend on the type of the data value; subsequent clauses follow the same sequence as the definition of types in ASN.1.

这是否意味着我可以放入每个构造类型,而应用程序必须只解释构造 TLV 结构的值部分?

When you encode a primitive OCTET STRING in indefinite length mode, the encoder must:

  • split up the value into chunks of smaller OCTET STRINGs
  • encode each chunk in definite length mode so that each has its own TLV (with length!)
  • the whole sequence of definite length encoded primitive OCTET STRINGs must be framed by a single, indefinite length encoded constructed OCTET STRING "container" having its own TLV (without length, but with end-of-octets sentinel)

At the other end, the decoder extracts the V part from the inner, definite length OCTET STRING chunks (dropping their TL headers). Then joins/consumes V's together in the order of arrival dropping the TL part of the outer frame.

Note that the idea behind indefinite length encoding technique is that both encoder and decoder can emit/consume incomplete, possibly oversized, data.

Chunk size is chosen by the encoder/application based on data availability, memory situation and possibly the estimation of decoder's buffering capabilities. I think this is mentioned somewhere in the X.280/X.680 papers.

Encoder is not allowed to put chunks of different ASN.1 types into any single indefinite length encoded container. In other words, all chunks must be of the same type as the outer container.

That should hopefully explain why you may see multiple (depending on chunk size) OCTET STRINGs in the indefinite length encoded BER/CER stream where just a single OCTET STRING is expected.

DER forbids indefinite length encoding on the grounds that serialized representation of the same data may change on re-encoding (due to potentially changing chunk size).