解码 PHP 中的 curl 响应 gzip 多部分附件

decode curl response gzip multipart attachment in PHP

我正在尝试解码从 curl 请求收到的二进制压缩附件,附件是一个 xml 文件,但由 API 端点以二进制形式发送。这是我收到的完整请求:

--_=4883624417507473IBM4883624417507473MOKO
Content-Transfer-Encoding: 8bit
Content-ID: 30854c92-252a-4cb0-ae65-18ecf0de28d5
Content-Type: application/soap+xml; charset=UTF-8

<?xml version="1.0" encoding="utf-8"?><soapenv:Envelope xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope"><soapenv:Header xmlns:wsa="http://www.w3.org/2005/08/addressing"><ns2:Messaging xmlns:ns2="http://docs.oasis-open.org/ebxml-msg/ebms/v3.0/ns/core/200704/" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" soapenv:mustUnderstand="true" wsu:Id="soapheader-1">
<rest of xml elements have been removed!>
</soapenv:Header><soapenv:Body xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" wsu:Id="soapbody"></soapenv:Body></soapenv:Envelope>
--_=4883624417507473IBM4883624417507473MOKO
Content-Disposition: attachment; filename=Part1
Content-Transfer-Encoding: binary
Content-ID: <Attachment1>
Content-Type: application/gzip

‹íWÝOÛ0Ÿ´ÿáÔ—¾9-* ªP‰Ñn«¥*Ý´    ¡É8Wj-±#Ûé‡Ðþ÷ÙI!n  LŒ!ÄÛ}ùî|?ß%öf(,’XèýÚÔ˜´úB‘K9#4˜LD€Î†4¶HD
­uÂcºàI–œZ¹âfIe„žRR…A¥.Ì£ô
&Ú£-éÎ&§ŸFäKo@úƒÏ'¤ž*\wÖ©j¸a°çt*•!]ÔLñÔYtÆŠ
p­-ß„ÎC­'YlÏÞ2$Ëë’,)ÒÚPÁô#{>á¨:6Õ{õ¥Ú#­ûÀ/ÿ+²õ„‘†Æ ²äÈ W!Êò¼€0S,d×Uã®
ó?•ERE4¨´G{¤_ŽÂOT*½#c\WÛ
‰'ðiýŠ‘k=rƒjÌRª5F@3cOp{6EöûºGA·ýìÐM(_t·ÝÎKu›R!0†yTà÷ŽÜcÛ}©¦{MȽ‰qÙl<t™ÀEŠÌXÄÐ9×ïýu7Jÿ£]0p„LªèWcžp›t%Ëg×ݯíì6öZÛ­Övs¯v£Økì׆?{ßãR<^¦¹xÔsÒÆÔ¢â;«ÐÝö–Š¶yÀbáÎÞ·\Ü©/£®î¾Æ¯ØòiÈÁø$º?ŽÍµÿxgÁ+þ³ítbw
z‰ñ„4ùGlDoèÆH5“ÂØK™)ÐrbæT!¤JÎxdßÛ*òíçîÄ]4ùP»•ÆèÀú
̺ûÃ^R†P_3ŸÏ‰52HégÂÔaJÝZ'lg&£Ò©JîÀ×Æuv*í~qß$©áºÓÛPý—ë¯\˜6Ìm–yåܪÇÅ¥-»rù^ç}¶*ÆùÆ}ÎTzs{ÝræU¯,o^x¯}v«lg¯ñŠ ÷7ÿšÅši1€‚ü¨J7\”ëŠ
V¯x‚lvR)è|üðo?M/
--_=4883624417507473IBM4883624417507473MOKO--

我一直在搜索和尝试不同的东西但无法解码附件,我使用以下方法仅获取附件部分:

preg_match('/(?<xml><.*?\?xml version=.*>)/', $response, $match);
$xml = $match['xml'];
$offset = strpos($response, $xml) + strlen($xml . PHP_EOL);
$attach = substr($response, $offset);

我有一个工作的 C#.net 代码连接到相同的 API,如下所示:

byte[] myData;
byte[] rv;
using (var webResponse = req.GetResponse())
{
   var responseHeaderstream = webResponse.Headers.ToByteArray();
   var responseStream = webResponse.GetResponseStream();
   myData = ReadFully(responseStream);
   responseStream.Dispose();
   rv = new byte[responseHeaderstream.Length + myData.Length];
   System.Buffer.BlockCopy(responseHeaderstream, 0, rv, 0, responseHeaderstream.Length);
   System.Buffer.BlockCopy(myData, 0, rv, responseHeaderstream.Length, myData.Length);                    
}

然后使用以下代码循环读取找到的任何附件,然后解压缩所有找到的附件,结果是一个 XML 文件,这种类型的请求应该只有一个附件:

Dim memstream As Stream = New MemoryStream(rv)
Dim entity As MimeMessage = MimeMessage.Load(memstream)
Dim attachments = New List(Of MimePart)()
Dim multiparts = New List(Of Multipart)()
Dim iter = New MimeIterator(entity)
While iter.MoveNext()
   Dim multipart = TryCast(iter.Parent, Multipart)
   Dim part = TryCast(iter.Current, MimePart)
   If multipart IsNot Nothing AndAlso part IsNot Nothing AndAlso part.IsAttachment Then
       multiparts.Add(multipart)
       attachments.Add(part)
   End If
End While

For i As Integer = 0 To attachments.Count - 1
    multiparts(i).Remove(attachments(i))
Next

For Each attachment In attachments
   Using memory = New MemoryStream()
       attachment.Content.DecodeTo(memory)
       Dim bytes = memory.ToArray()
       If attachment.ContentType.MimeType = "application/gzip" Then
          strAtchmnt = Unzip(bytes)
       Else
          strAtchmnt = Encoding.UTF8.GetString(bytes)
       End If
   End Using
Next

下面是解码附件所需的其他函数:

Public Shared Function Unzip(ByVal bytes As Byte()) As String
    Using msi = New MemoryStream(bytes)
        Using mso = New MemoryStream()
            Using gs = New GZipStream(msi, CompressionMode.Decompress)
                CopyTo(gs, mso)
            End Using
            Return Encoding.UTF8.GetString(mso.ToArray())
        End Using
    End Using
End Function
Public Shared Sub CopyTo(ByVal src As Stream, ByVal dest As Stream)
    Dim bytes As Byte() = New Byte(4095) {}
    Dim cnt As Integer
    cnt = -1
    While cnt <> 0
        cnt = src.Read(bytes, 0, bytes.Length)
        If cnt <> 0 Then dest.Write(bytes, 0, cnt)
    End While
End Sub

非常感谢任何帮助。
这是完整的 curl 请求,$pulreq 是已签名的 xml 文档,此处无需包含:

$guid = $this->guidv4();
$guidstring = "<" . $guid . "@ATODN-".substr(str_shuffle(MD5(microtime())), 0, 9).">";
$boundary = "_=Part_".dechex(time()).".". time();
$content_type_header = 'Content-Type: multipart/related; '
            .'type="application/xml"; '
            .'boundary="' . $boundary . '"; '
            .'start="'.$guidstring.'"; '
            .'start-info="application/soap+xml";';
$accept_header = 'Accept: multipart/related';
$transfer_encoding_header = 'Transfer-Encoding: chunked';
$headers = array($content_type_header, $accept_header, $transfer_encoding_header);
$postData  = "--" . $boundary . "\r\n"
            ."Content-Type: application/soap+xml\r\n"
            ."Content-Transfer-Encoding: 8bit\r\n"
            ."Content-ID: ".$guidstring."\r\n\r\n"
            .$pulreq . "\r\n"
            ."--" .$boundary . "\r\n";
$curl = curl_init('https://xxxx/services/xxxx-async-pull');
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, $postData);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_SSLVERSION, CURL_SSLVERSION_TLSv1_2);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 60);
curl_setopt($curl, CURLOPT_TIMEOUT, 60);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_ENCODING,'');
curl_setopt($curl, CURLINFO_HEADER_OUT, true);

$response = curl_exec($curl);
if (curl_errno($curl)) {
    throw new Exception('cURL error:<br>' . curl_error($curl));
}
echo $response;

如果我们假设 \r\n 的第一个实例之前的数据的第一部分是“部分分隔符”,并且我们假设每个部分的元数据由 \r\n 分隔,并且对于每个部分,元数据和实际数据由“\r\n\r\n”分隔(这似乎是这种情况,并且非常让人联想到 multipart/form-data 格式的样子!) , 然后提取这些部分就像

function sectionExtractor(string $raw):array{
    $rn="\r\n";
    $separatorEndPos=strpos($raw,$rn);
    $separator=substr($raw,0, $separatorEndPos);
    // remove separator
    $raw = substr($raw, $separatorEndPos+strlen($rn));
    $rawSections = explode($rn.$separator,$raw);
    $parsedSections = array();
    foreach($rawSections as $rawSection){
        $metadataDataSeparator=$rn.$rn;
        $metadataDataSeparatorPosition = strpos($rawSection, $metadataDataSeparator);
        $metadata=substr($rawSection, 0, $metadataDataSeparatorPosition);
        $metadata = explode($rn, $metadata);
        $metadata = array_filter($metadata, 'strlen'); // probably a bug in separator exploding logic..
        $data = substr($rawSection, $metadataDataSeparatorPosition + strlen($metadataDataSeparator));
        $parsedSections[]=["metadata"=>$metadata, "data"=>$data];
    }
    if (version_compare(PHP_VERSION, '7.3.0', '>=')) {
        unset($parsedSections[array_key_last($parsedSections)]);//a bug in exploding the last separator..
    }
    return $parsedSections;
}
$sections = sectionExtractor($data);
var_export($sections);

根据您的数据得出的结果

array (
  0 => 
  array (
    'metadata' => 
    array (
      0 => 'Content-Transfer-Encoding: 8bit',
      1 => 'Content-ID: 30854c92-252a-4cb0-ae65-18ecf0de28d5',
      2 => 'Content-Type: application/soap+xml; charset=UTF-8',
    ),
    'data' => '<?xml version="1.0" encoding="utf-8"?><soapenv:Envelope xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope"><soapenv:Header xmlns:wsa="http://www.w3.org/2005/08/addressing"><ns2:Messaging xmlns:ns2="http://docs.oasis-open.org/ebxml-msg/ebms/v3.0/ns/core/200704/" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" soapenv:mustUnderstand="true" wsu:Id="soapheader-1">
<rest of xml elements have been removed!>
</soapenv:Header><soapenv:Body xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" wsu:Id="soapbody"></soapenv:Body></soapenv:Envelope>',
  ),
  1 => 
  array (
    'metadata' => 
    array (
      1 => 'Content-Disposition: attachment; filename=Part1',
      2 => 'Content-Transfer-Encoding: binary',
      3 => 'Content-ID: <Attachment1>',
      4 => 'Content-Type: application/gzip',
    ),
    'data' => '‹íWÝOÛ0Ÿ´ÿáÔ—¾9-* ªP‰Ñn«¥*Ý´    ¡É8Wj-±#Ûé‡Ðþ÷ÙI!n  LŒ!ÄÛ}ùî|?ß%öf(,’XèýÚÔ˜´úB‘K9#4˜LD€Î†4¶HD
­uÂcºàI–œZ¹âfIe„žRR…A¥.Ì£ô
&Ú£-éÎ&§ŸFäKo@úƒÏ\'¤ž*\wÖ©j¸a°çt*•!]ÔLñÔYtÆŠ
p­-ß„ÎC­\'YlÏÞ2$Ëë’,)ÒÚPÁô#{>á¨:6Õ{õ¥Ú#­ûÀ/ÿ+²õ„‘†Æ ²äÈ W!Êò¼€0S,d×Uã®
ó?•ERE4¨´G{¤_ŽÂOT*½#c\WÛ
‰\'ðiýŠ‘k=rƒjÌRª5F@3cOp{6EöûºGA·ýìÐM(_t·ÝÎKu›R!0†yTà÷ŽÜcÛ}©¦{MȽ‰qÙl<t™ÀEŠÌXÄÐ9×ïýu7Jÿ£]0p„LªèWcžp›t%Ëg×ݯíì6öZÛ­Övs¯v£Økì׆?{ßãR<^¦¹xÔsÒÆÔ¢â;«ÐÝö–Š¶yÀbáÎÞ·\Ü©/£®î¾Æ¯ØòiÈÁø$º?ŽÍµÿxgÁ+þ³ítbw
z‰ñ„4ùGlDoèÆH5“ÂØK™)ÐrbæT!¤JÎxdßÛ*òíçîÄ]4ùP»•ÆèÀú
̺ûÃ^R†P_3ŸÏ‰52HégÂÔaJÝZ\'lg&£Ò©JîÀ×Æuv*í~qß$©áºÓÛPý—ë¯\˜6Ìm–yåܪÇÅ¥-»rù^ç}¶*ÆùÆ}ÎTzs{ÝræU¯,o^x¯}v«lg¯ñŠ ÷7ÿšÅši1€‚ü¨J7\”ëŠ
V¯x‚lvR)è|üðo?M/',
  ),
)

原始 zip 数据在那里 $sections[1]["data"] ,一旦你有了它,你就可以使用 PECL zipArchive 之类的东西,比如

$zipRaw = $sections[1]["data"];
$tmpZipFileHandle = tmpfile();
$tmpZipFileLocation = stream_get_meta_data($tmpZipFileHandle)['uri'];
fwrite($tmpZipFileHandle, $zipRaw);
$zip = new ZipArchive;
$zip->open($tmpZipFileLocation);
$zip->extractTo('/my/destination/dir/');

然后压缩文件的内容应该被提取到/my/destination/dir/

看来解决方法很简单,之前没想到 提取解码后的附件后,我只需要:

$xml_string = gzdecode($decoded_attachment);

结果是预期的 XML 附件