pdf完整性验证失败
Verification of pdf integrity fail
我正在尝试通过 bash 命令验证 pdf 文件的完整性。
我使用 dd 提取了 pdf 的 signedContent 和 pkcs7 分离对象。
然后我通过
解码了pkcs
xxd -r -p pkcs7_extracted > pkcs7_extracted.bin
openssl asn1parse -inform DER <pkcs7_extracted.bin >pkcs7_extracted_decoded
从解码的 pkcs7 中我得到了一些有用的信息
0:d=0 hl=4 l=5498 cons: SEQUENCE
4:d=1 hl=2 l= 9 prim: OBJECT :pkcs7-signedData
15:d=1 hl=4 l=5483 cons: cont [ 0 ]
19:d=2 hl=4 l=5479 cons: SEQUENCE
23:d=3 hl=2 l= 1 prim: INTEGER :01
26:d=3 hl=2 l= 15 cons: SET
28:d=4 hl=2 l= 13 cons: SEQUENCE
30:d=5 hl=2 l= 9 prim: OBJECT :sha256
41:d=5 hl=2 l= 0 prim: NULL
43:d=3 hl=2 l= 11 cons: SEQUENCE
...
5154:d=7 hl=2 l= 9 prim: OBJECT :contentType
5165:d=7 hl=2 l= 11 cons: SET
5167:d=8 hl=2 l= 9 prim: OBJECT :pkcs7-data
5178:d=6 hl=2 l= 47 cons: SEQUENCE
5180:d=7 hl=2 l= 9 prim: OBJECT :messageDigest
5191:d=7 hl=2 l= 34 cons: SET
5193:d=8 hl=2 l= 32 prim: OCTET STRING [HEX DUMP]:18B399D208A08815DDF23C93B1B63B13757A6AA24B1932569D7A69D0DB3A34C2
5227:d=5 hl=2 l= 13 cons: SEQUENCE
5229:d=6 hl=2 l= 9 prim: OBJECT :sha256WithRSAEncryption
5240:d=6 hl=2 l= 0 prim: NULL
5242:d=5 hl=4 l= 256 prim: OCTET STRING [HEX DUMP]:8F4B21914173EC57E6B0533BB5E04FB7054F23AC299C1BDBF589ED164A3EABB611727BE9117AAC3161D9C18DCA08BD113DD3AA90E5922009FA12BA59E7F6587E81CD79BDED09F862C2C76F35D950926F1A31A3DCCE999A52DCE0C7F67D081E81A44397E8AF96A1051B8E51F2E2271221B06D05C9895E1846B1DBE02B558F5B9EF97C7EB0FF9A7C71A9764D5E205900818F07E82027D79D3F9A5AA72B3A0CF131F1B890D0BCBF3C4DD8A0229FABE15F6C2CA0CE079EB925B3998A1A6190596A88D8F07C1C12B8750636E69108E30E643A653B285A400080C9C5590C112451F6D69BAFC2686D6F1107B37A5DB36B9F797C49E61D4B44E62E17DD541778DE763AC5
5502:d=0 hl=2 l= 0 prim: EOC
我特别注意到 messageDigest 字段等于使用 ByteRange 获得的 signedContent 的计算摘要。
我已经提取了加密的哈希值,用我的公钥对其进行解密,然后用 asn1 命令再次解码。
dd if=pkcs7_extracted.bin of=extracted.sign.bin bs=1 skip=$[ 5242 + 4 ] count=256
#decrypt
openssl rsautl -verify -pubin -inkey publickey.pem < extracted.sign.bin > verified.bin
#decode of result
openssl asn1parse -inform der -in verified.bin
结果是这个对象
0:d=0 hl=2 l= 49 cons: SEQUENCE
2:d=1 hl=2 l= 13 cons: SEQUENCE
4:d=2 hl=2 l= 9 prim: OBJECT :sha256
15:d=2 hl=2 l= 0 prim: NULL
17:d=1 hl=2 l= 32 prim: OCTET STRING [HEX DUMP]:EBAA31519CD0CCA793FEC34AA6BDD8DFA5E4D5F63BA4711F6C8ECE5D20FEF393
我很确定解密有效,因为对象已正确解码,并且如我所料包含 sha256 对象,但如您所见,摘要值不同...
我是不是找错地方了?我不知道如何验证完整性。
此外,Acrobat 当然会为此签名验证文档的完整性。
提前致谢!
请注意,在一个 SignedData
对象中,需要考虑多个通常不相等的哈希值。
查看 RFC 3852 中加密消息语法 (CMS) 对象的定义。
(RFC 3852 是当前 PDF 规范 ISO 32000-1 中引用的 RFC;因此,即使它 被 RFC 5652 废弃,较新的 RFC 中的更改可能不适用于此上下文。)
SignedData ::= SEQUENCE {
version CMSVersion,
digestAlgorithms DigestAlgorithmIdentifiers,
encapContentInfo EncapsulatedContentInfo,
certificates [0] IMPLICIT CertificateSet OPTIONAL,
crls [1] IMPLICIT RevocationInfoChoices OPTIONAL,
signerInfos SignerInfos }
...
SignerInfo ::= SEQUENCE {
version CMSVersion,
sid SignerIdentifier,
digestAlgorithm DigestAlgorithmIdentifier,
signedAttrs [0] IMPLICIT SignedAttributes OPTIONAL,
signatureAlgorithm SignatureAlgorithmIdentifier,
signature SignatureValue,
unsignedAttrs [1] IMPLICIT UnsignedAttributes OPTIONAL }
...
SignedAttributes ::= SET SIZE (1..MAX) OF Attribute
...
signedAttrs is a collection of attributes that are signed. The
field is optional, but it MUST be present if the content type of
the EncapsulatedContentInfo value being signed is not id-data.
SignedAttributes MUST be DER encoded, even if the rest of the
structure is BER encoded. Useful attribute types, such as signing
time, are defined in Section 11. If the field is present, it MUST
contain, at a minimum, the following two attributes:
A content-type attribute having as its value the content type
of the EncapsulatedContentInfo value being signed. Section
11.1 defines the content-type attribute. However, the
content-type attribute MUST NOT be used as part of a
countersignature unsigned attribute as defined in section 11.4.
A message-digest attribute, having as its value the message
digest of the content. Section 11.2 defines the message-digest
attribute.
...
The result of the message digest calculation process depends on
whether the signedAttrs field is present. When the field is absent,
the result is just the message digest of the content as described
above. When the field is present, however, the result is the message
digest of the complete DER encoding of the SignedAttrs value
contained in the signedAttrs field. Since the SignedAttrs value,
when present, must contain the content-type and the message-digest
attributes, those values are indirectly included in the result.
因此,您的观察
that the messageDigest field is equal to the calculated digest of the signedContent obtained using the ByteRange.
5178:d=6 hl=2 l= 47 cons: SEQUENCE
5180:d=7 hl=2 l= 9 prim: OBJECT :messageDigest
5191:d=7 hl=2 l= 34 cons: SET
5193:d=8 hl=2 l= 32 prim: OCTET STRING [HEX DUMP]:18B399D208A08815DDF23C93B1B63B13757A6AA24B1932569D7A69D0DB3A34C2
表示正确的数据已签名,因为 消息摘要属性 应具有 内容 的消息摘要作为其值。
但是你也可以在这里读到,由实际内部 signature 字节签名的数据(你解密的)不是这个消息内容的摘要 而是 属性集合 signedAttrs!
因此,您不能针对 content 哈希验证那些 signature 字节,而是针对 signed attributes hash 如 RFC 中所述。
PS:OP 同时发现 this other answer 关于 CMS 签名数据验证的主题,这进一步说明了如何更直观地识别哪些属性已签名,哪些未签名。
PPS:OP通过解密签名字节,提取包含的散列,并将其与实际的进行比较来验证。这对于基于 RSA 的签名是可以的。但是,基于 DSA 或 ECDSA 的签名无法解密,因此无法提取哈希值。必须使用特殊的验证例程进行验证。
PPPS: 有不同风格的集成PDF签名。虽然此处使用的样式(PKCS7/CAdES 分离)是最常见和推荐的样式,但在通用解决方案中,必须事先检查并相应地进行验证。
我正在尝试通过 bash 命令验证 pdf 文件的完整性。
我使用 dd 提取了 pdf 的 signedContent 和 pkcs7 分离对象。
然后我通过
解码了pkcsxxd -r -p pkcs7_extracted > pkcs7_extracted.bin
openssl asn1parse -inform DER <pkcs7_extracted.bin >pkcs7_extracted_decoded
从解码的 pkcs7 中我得到了一些有用的信息
0:d=0 hl=4 l=5498 cons: SEQUENCE
4:d=1 hl=2 l= 9 prim: OBJECT :pkcs7-signedData
15:d=1 hl=4 l=5483 cons: cont [ 0 ]
19:d=2 hl=4 l=5479 cons: SEQUENCE
23:d=3 hl=2 l= 1 prim: INTEGER :01
26:d=3 hl=2 l= 15 cons: SET
28:d=4 hl=2 l= 13 cons: SEQUENCE
30:d=5 hl=2 l= 9 prim: OBJECT :sha256
41:d=5 hl=2 l= 0 prim: NULL
43:d=3 hl=2 l= 11 cons: SEQUENCE
...
5154:d=7 hl=2 l= 9 prim: OBJECT :contentType
5165:d=7 hl=2 l= 11 cons: SET
5167:d=8 hl=2 l= 9 prim: OBJECT :pkcs7-data
5178:d=6 hl=2 l= 47 cons: SEQUENCE
5180:d=7 hl=2 l= 9 prim: OBJECT :messageDigest
5191:d=7 hl=2 l= 34 cons: SET
5193:d=8 hl=2 l= 32 prim: OCTET STRING [HEX DUMP]:18B399D208A08815DDF23C93B1B63B13757A6AA24B1932569D7A69D0DB3A34C2
5227:d=5 hl=2 l= 13 cons: SEQUENCE
5229:d=6 hl=2 l= 9 prim: OBJECT :sha256WithRSAEncryption
5240:d=6 hl=2 l= 0 prim: NULL
5242:d=5 hl=4 l= 256 prim: OCTET STRING [HEX DUMP]:8F4B21914173EC57E6B0533BB5E04FB7054F23AC299C1BDBF589ED164A3EABB611727BE9117AAC3161D9C18DCA08BD113DD3AA90E5922009FA12BA59E7F6587E81CD79BDED09F862C2C76F35D950926F1A31A3DCCE999A52DCE0C7F67D081E81A44397E8AF96A1051B8E51F2E2271221B06D05C9895E1846B1DBE02B558F5B9EF97C7EB0FF9A7C71A9764D5E205900818F07E82027D79D3F9A5AA72B3A0CF131F1B890D0BCBF3C4DD8A0229FABE15F6C2CA0CE079EB925B3998A1A6190596A88D8F07C1C12B8750636E69108E30E643A653B285A400080C9C5590C112451F6D69BAFC2686D6F1107B37A5DB36B9F797C49E61D4B44E62E17DD541778DE763AC5
5502:d=0 hl=2 l= 0 prim: EOC
我特别注意到 messageDigest 字段等于使用 ByteRange 获得的 signedContent 的计算摘要。
我已经提取了加密的哈希值,用我的公钥对其进行解密,然后用 asn1 命令再次解码。
dd if=pkcs7_extracted.bin of=extracted.sign.bin bs=1 skip=$[ 5242 + 4 ] count=256
#decrypt
openssl rsautl -verify -pubin -inkey publickey.pem < extracted.sign.bin > verified.bin
#decode of result
openssl asn1parse -inform der -in verified.bin
结果是这个对象
0:d=0 hl=2 l= 49 cons: SEQUENCE
2:d=1 hl=2 l= 13 cons: SEQUENCE
4:d=2 hl=2 l= 9 prim: OBJECT :sha256
15:d=2 hl=2 l= 0 prim: NULL
17:d=1 hl=2 l= 32 prim: OCTET STRING [HEX DUMP]:EBAA31519CD0CCA793FEC34AA6BDD8DFA5E4D5F63BA4711F6C8ECE5D20FEF393
我很确定解密有效,因为对象已正确解码,并且如我所料包含 sha256 对象,但如您所见,摘要值不同...
我是不是找错地方了?我不知道如何验证完整性。
此外,Acrobat 当然会为此签名验证文档的完整性。
提前致谢!
请注意,在一个 SignedData
对象中,需要考虑多个通常不相等的哈希值。
查看 RFC 3852 中加密消息语法 (CMS) 对象的定义。
(RFC 3852 是当前 PDF 规范 ISO 32000-1 中引用的 RFC;因此,即使它 被 RFC 5652 废弃,较新的 RFC 中的更改可能不适用于此上下文。)
SignedData ::= SEQUENCE {
version CMSVersion,
digestAlgorithms DigestAlgorithmIdentifiers,
encapContentInfo EncapsulatedContentInfo,
certificates [0] IMPLICIT CertificateSet OPTIONAL,
crls [1] IMPLICIT RevocationInfoChoices OPTIONAL,
signerInfos SignerInfos }
...
SignerInfo ::= SEQUENCE {
version CMSVersion,
sid SignerIdentifier,
digestAlgorithm DigestAlgorithmIdentifier,
signedAttrs [0] IMPLICIT SignedAttributes OPTIONAL,
signatureAlgorithm SignatureAlgorithmIdentifier,
signature SignatureValue,
unsignedAttrs [1] IMPLICIT UnsignedAttributes OPTIONAL }
...
SignedAttributes ::= SET SIZE (1..MAX) OF Attribute
...
signedAttrs is a collection of attributes that are signed. The
field is optional, but it MUST be present if the content type of
the EncapsulatedContentInfo value being signed is not id-data.
SignedAttributes MUST be DER encoded, even if the rest of the
structure is BER encoded. Useful attribute types, such as signing
time, are defined in Section 11. If the field is present, it MUST
contain, at a minimum, the following two attributes:
A content-type attribute having as its value the content type
of the EncapsulatedContentInfo value being signed. Section
11.1 defines the content-type attribute. However, the
content-type attribute MUST NOT be used as part of a
countersignature unsigned attribute as defined in section 11.4.
A message-digest attribute, having as its value the message
digest of the content. Section 11.2 defines the message-digest
attribute.
...
The result of the message digest calculation process depends on
whether the signedAttrs field is present. When the field is absent,
the result is just the message digest of the content as described
above. When the field is present, however, the result is the message
digest of the complete DER encoding of the SignedAttrs value
contained in the signedAttrs field. Since the SignedAttrs value,
when present, must contain the content-type and the message-digest
attributes, those values are indirectly included in the result.
因此,您的观察
that the messageDigest field is equal to the calculated digest of the signedContent obtained using the ByteRange.
5178:d=6 hl=2 l= 47 cons: SEQUENCE
5180:d=7 hl=2 l= 9 prim: OBJECT :messageDigest
5191:d=7 hl=2 l= 34 cons: SET
5193:d=8 hl=2 l= 32 prim: OCTET STRING [HEX DUMP]:18B399D208A08815DDF23C93B1B63B13757A6AA24B1932569D7A69D0DB3A34C2
表示正确的数据已签名,因为 消息摘要属性 应具有 内容 的消息摘要作为其值。
但是你也可以在这里读到,由实际内部 signature 字节签名的数据(你解密的)不是这个消息内容的摘要 而是 属性集合 signedAttrs!
因此,您不能针对 content 哈希验证那些 signature 字节,而是针对 signed attributes hash 如 RFC 中所述。
PS:OP 同时发现 this other answer 关于 CMS 签名数据验证的主题,这进一步说明了如何更直观地识别哪些属性已签名,哪些未签名。
PPS:OP通过解密签名字节,提取包含的散列,并将其与实际的进行比较来验证。这对于基于 RSA 的签名是可以的。但是,基于 DSA 或 ECDSA 的签名无法解密,因此无法提取哈希值。必须使用特殊的验证例程进行验证。
PPPS: 有不同风格的集成PDF签名。虽然此处使用的样式(PKCS7/CAdES 分离)是最常见和推荐的样式,但在通用解决方案中,必须事先检查并相应地进行验证。