pdf完整性验证失败

Question

我正在尝试通过 bash 命令验证 pdf 文件的完整性。

我使用 dd 提取了 pdf 的 signedContent 和 pkcs7 分离对象。

然后我通过

解码了pkcs

xxd -r -p pkcs7_extracted > pkcs7_extracted.bin

openssl asn1parse -inform DER <pkcs7_extracted.bin >pkcs7_extracted_decoded

从解码的 pkcs7 中我得到了一些有用的信息

 0:d=0  hl=4 l=5498 cons: SEQUENCE         
 4:d=1  hl=2 l=   9 prim: OBJECT            :pkcs7-signedData
 15:d=1  hl=4 l=5483 cons: cont [ 0 ]        
 19:d=2  hl=4 l=5479 cons: SEQUENCE          
 23:d=3  hl=2 l=   1 prim: INTEGER           :01
 26:d=3  hl=2 l=  15 cons: SET               
 28:d=4  hl=2 l=  13 cons: SEQUENCE          
 30:d=5  hl=2 l=   9 prim: OBJECT            :sha256
 41:d=5  hl=2 l=   0 prim: NULL              
 43:d=3  hl=2 l=  11 cons: SEQUENCE          
 ...
 5154:d=7  hl=2 l=   9 prim: OBJECT            :contentType
 5165:d=7  hl=2 l=  11 cons: SET               
 5167:d=8  hl=2 l=   9 prim: OBJECT            :pkcs7-data
 5178:d=6  hl=2 l=  47 cons: SEQUENCE          
 5180:d=7  hl=2 l=   9 prim: OBJECT            :messageDigest
 5191:d=7  hl=2 l=  34 cons: SET               
 5193:d=8  hl=2 l=  32 prim: OCTET STRING      [HEX DUMP]:18B399D208A08815DDF23C93B1B63B13757A6AA24B1932569D7A69D0DB3A34C2
 5227:d=5  hl=2 l=  13 cons: SEQUENCE          
 5229:d=6  hl=2 l=   9 prim: OBJECT            :sha256WithRSAEncryption
 5240:d=6  hl=2 l=   0 prim: NULL              
 5242:d=5  hl=4 l= 256 prim: OCTET STRING      [HEX DUMP]:8F4B21914173EC57E6B0533BB5E04FB7054F23AC299C1BDBF589ED164A3EABB611727BE9117AAC3161D9C18DCA08BD113DD3AA90E5922009FA12BA59E7F6587E81CD79BDED09F862C2C76F35D950926F1A31A3DCCE999A52DCE0C7F67D081E81A44397E8AF96A1051B8E51F2E2271221B06D05C9895E1846B1DBE02B558F5B9EF97C7EB0FF9A7C71A9764D5E205900818F07E82027D79D3F9A5AA72B3A0CF131F1B890D0BCBF3C4DD8A0229FABE15F6C2CA0CE079EB925B3998A1A6190596A88D8F07C1C12B8750636E69108E30E643A653B285A400080C9C5590C112451F6D69BAFC2686D6F1107B37A5DB36B9F797C49E61D4B44E62E17DD541778DE763AC5
 5502:d=0  hl=2 l=   0 prim: EOC

我特别注意到 messageDigest 字段等于使用 ByteRange 获得的 signedContent 的计算摘要。

我已经提取了加密的哈希值，用我的公钥对其进行解密，然后用 asn1 命令再次解码。

dd if=pkcs7_extracted.bin of=extracted.sign.bin bs=1 skip=$[ 5242 + 4 ] count=256

#decrypt

openssl rsautl -verify -pubin -inkey publickey.pem < extracted.sign.bin > verified.bin

#decode of result
openssl asn1parse -inform der -in verified.bin

结果是这个对象

0:d=0  hl=2 l=  49 cons: SEQUENCE          
2:d=1  hl=2 l=  13 cons: SEQUENCE          
4:d=2  hl=2 l=   9 prim: OBJECT            :sha256
15:d=2  hl=2 l=   0 prim: NULL              
17:d=1  hl=2 l=  32 prim: OCTET STRING      [HEX DUMP]:EBAA31519CD0CCA793FEC34AA6BDD8DFA5E4D5F63BA4711F6C8ECE5D20FEF393

我很确定解密有效，因为对象已正确解码，并且如我所料包含 sha256 对象，但如您所见，摘要值不同...

我是不是找错地方了？我不知道如何验证完整性。

此外，Acrobat 当然会为此签名验证文档的完整性。

提前致谢！

Answer 1

请注意，在一个 SignedData 对象中，需要考虑多个通常不相等的哈希值。

查看 RFC 3852 中加密消息语法 (CMS) 对象的定义。

(RFC 3852 是当前 PDF 规范 ISO 32000-1 中引用的 RFC；因此，即使它被 RFC 5652 废弃，较新的 RFC 中的更改可能不适用于此上下文。）

  SignedData ::= SEQUENCE {
    version CMSVersion,
    digestAlgorithms DigestAlgorithmIdentifiers,
    encapContentInfo EncapsulatedContentInfo,
    certificates [0] IMPLICIT CertificateSet OPTIONAL,
    crls [1] IMPLICIT RevocationInfoChoices OPTIONAL,
    signerInfos SignerInfos }

...

  SignerInfo ::= SEQUENCE {
    version CMSVersion,
    sid SignerIdentifier,
    digestAlgorithm DigestAlgorithmIdentifier,
    signedAttrs [0] IMPLICIT SignedAttributes OPTIONAL,
    signatureAlgorithm SignatureAlgorithmIdentifier,
    signature SignatureValue,
    unsignedAttrs [1] IMPLICIT UnsignedAttributes OPTIONAL }

...

  SignedAttributes ::= SET SIZE (1..MAX) OF Attribute

...

  signedAttrs is a collection of attributes that are signed.  The
  field is optional, but it MUST be present if the content type of
  the EncapsulatedContentInfo value being signed is not id-data.
  SignedAttributes MUST be DER encoded, even if the rest of the
  structure is BER encoded.  Useful attribute types, such as signing
  time, are defined in Section 11.  If the field is present, it MUST
  contain, at a minimum, the following two attributes:

     A content-type attribute having as its value the content type
     of the EncapsulatedContentInfo value being signed.  Section
     11.1 defines the content-type attribute.  However, the
     content-type attribute MUST NOT be used as part of a
     countersignature unsigned attribute as defined in section 11.4.

     A message-digest attribute, having as its value the message
     digest of the content.  Section 11.2 defines the message-digest
     attribute.

...

  The result of the message digest calculation process depends on
  whether the signedAttrs field is present.  When the field is absent,
  the result is just the message digest of the content as described
  above.  When the field is present, however, the result is the message
  digest of the complete DER encoding of the SignedAttrs value
  contained in the signedAttrs field.  Since the SignedAttrs value,
  when present, must contain the content-type and the message-digest
  attributes, those values are indirectly included in the result.

因此，您的观察

that the messageDigest field is equal to the calculated digest of the signedContent obtained using the ByteRange.

 5178:d=6  hl=2 l=  47 cons: SEQUENCE          
 5180:d=7  hl=2 l=   9 prim: OBJECT            :messageDigest
 5191:d=7  hl=2 l=  34 cons: SET               
 5193:d=8  hl=2 l=  32 prim: OCTET STRING      [HEX DUMP]:18B399D208A08815DDF23C93B1B63B13757A6AA24B1932569D7A69D0DB3A34C2

表示正确的数据已签名，因为 消息摘要属性 应具有内容的消息摘要作为其值。

但是你也可以在这里读到，由实际内部 signature 字节签名的数据（你解密的）不是这个消息内容的摘要 而是 属性集合 signedAttrs!

因此，您不能针对 content 哈希验证那些 signature 字节，而是针对 signed attributes hash 如 RFC 中所述。

PS：OP 同时发现 this other answer 关于 CMS 签名数据验证的主题，这进一步说明了如何更直观地识别哪些属性已签名，哪些未签名。

PPS：OP通过解密签名字节，提取包含的散列，并将其与实际的进行比较来验证。这对于基于 RSA 的签名是可以的。但是，基于 DSA 或 ECDSA 的签名无法解密，因此无法提取哈希值。必须使用特殊的验证例程进行验证。

PPPS: 有不同风格的集成PDF签名。虽然此处使用的样式（PKCS7/CAdES 分离）是最常见和推荐的样式，但在通用解决方案中，必须事先检查并相应地进行验证。

pdf完整性验证失败

Verification of pdf integrity fail

pdf

security

pkcs#7