从电子邮件中提取附件 - 无法获取附件的文件名
Extracting attachment from an Email - cannot get filename of attachment
我有一个 PHP 脚本,用于检查电子邮件帐户中是否有新邮件,并尝试从每封电子邮件中下载 .zip 和 .pdf 附件。我正在使用以下代码来执行此操作:
/* try to connect */
$inbox = imap_open($hostname, $username, $password) or die ('Cannot connect to domain:' . imap_last_error());
/* grab emails */
$emails = imap_search($inbox, 'ALL');
/* put the newest emails on top */
rsort($emails);
foreach ($emails as $email_number) {
$overview = imap_fetch_overview($inbox, $email_number, 0);
if ($overview [0]->seen) {
continue;
}
$structure = imap_fetchstructure($inbox, $email_number);
if (!property_exists($structure, 'parts')) {
continue;
}
//print_r($structure->parts);
//get attachments
}
对于大多数电子邮件,$structure->parts
看起来像这样:
[1] => stdClass Object
(
[type] => 3
[encoding] => 3
[ifsubtype] => 1
[subtype] => PDF
[ifdescription] => 0
[ifid] => 0
[bytes] => 132780
[ifdisposition] => 1
[disposition] => attachment
[ifdparameters] => 1
[dparameters] => Array
(
[0] => stdClass Object
(
[attribute] => filename
[value] => some_filename.pdf
)
)
[ifparameters] => 1
[parameters] => Array
(
[0] => stdClass Object
(
[attribute] => name
[value] => some_filename.pdf
)
)
)
[2] => stdClass Object
(
[type] => 3
[encoding] => 3
[ifsubtype] => 1
[subtype] => ZIP
[ifdescription] => 0
[ifid] => 0
[bytes] => 43170
[ifdisposition] => 1
[disposition] => attachment
[ifdparameters] => 1
[dparameters] => Array
(
[0] => stdClass Object
(
[attribute] => filename
[value] => another_filename.zip
)
)
[ifparameters] => 1
[parameters] => Array
(
[0] => stdClass Object
(
[attribute] => name
[value] => another_filename.zip
)
)
)
如您所见,很容易找出每个附件的扩展名和文件名。但是,最近我收到一些电子邮件,其中 $structure->parts 看起来像这样:
[1] => stdClass Object
(
[type] => 3
[encoding] => 3
[ifsubtype] => 1
[subtype] => OCTET-STREAM
[ifdescription] => 1
[description] => =?utf-8?B?Q1RHIFF1ZXLDqXRhcm8gIC0gIC0gMTJfOF8xNi5wZGY=?=
[ifid] => 1
[id] => <A86144A0CA6656448CFEA6FAA316C4C3@blkah.com>
[bytes] => 44592
[ifdisposition] => 1
[disposition] => attachment
[ifdparameters] => 1
[dparameters] => Array
(
[0] => stdClass Object
(
[attribute] => filename
[value] => =?utf-8?B?Q1RHIFF1ZXLDqXRhcm8gIC0gIC0gMTJfOF8xNi5wZGY=?=
)
[1] => stdClass Object
(
[attribute] => size
[value] => 32586
)
[2] => stdClass Object
(
[attribute] => creation-date
[value] => Thu, 08 Dec 2016 22:16:31 GMT
)
[3] => stdClass Object
(
[attribute] => modification-date
[value] => Thu, 08 Dec 2016 22:16:31 GMT
)
)
[ifparameters] => 1
[parameters] => Array
(
[0] => stdClass Object
(
[attribute] => name
[value] => =?utf-8?B?Q1RHIFF1ZXLDqXRhcm8gIC0gIC0gMTJfOF8xNi5wZGY=?=
)
)
)
[2] => stdClass Object
(
[type] => 3
[encoding] => 3
[ifsubtype] => 1
[subtype] => OCTET-STREAM
[ifdescription] => 1
[description] => =?utf-8?B?Q1RHIFF1ZXLDqXRhcm8gIC0gIC0gMTJfOF8xNi56aXA=?=
[ifid] => 1
[id] => <A070F623163C374D9ED5236DBCD3CA3C@blah.com>
[bytes] => 10966
[ifdisposition] => 1
[disposition] => attachment
[ifdparameters] => 1
[dparameters] => Array
(
[0] => stdClass Object
(
[attribute] => filename
[value] => =?utf-8?B?Q1RHIFF1ZXLDqXRhcm8gIC0gIC0gMTJfOF8xNi56aXA=?=
)
[1] => stdClass Object
(
[attribute] => size
[value] => 8011
)
[2] => stdClass Object
(
[attribute] => creation-date
[value] => Thu, 08 Dec 2016 22:16:31 GMT
)
[3] => stdClass Object
(
[attribute] => modification-date
[value] => Thu, 08 Dec 2016 22:16:31 GMT
)
)
[ifparameters] => 1
[parameters] => Array
(
[0] => stdClass Object
(
[attribute] => name
[value] => =?utf-8?B?Q1RHIFF1ZXLDqXRhcm8gIC0gIC0gMTJfOF8xNi56aXA=?=
)
)
)
这些附件也是 PDF 和 ZIP 格式,当使用电子邮件客户端时,它们看起来与任何其他电子邮件中的附件一样。但是正如您在上面看到的,文件名不是 blahblah.zip 和 blahblah.pdf,而是显示类似 "=?utf-8?B?Q1RHIFF1ZXLDqXRhcm8gIC0gIC0gMTJfOF8xNi56aXA=?=" 的内容。此外,两者的子类型都是 'OCTET-STREAM' 而不是 'zip' 或 'pdf'。所以我不知道每种类型的附件是什么,也无法对电子邮件进行任何操作。
如有任何帮助,我们将不胜感激。总而言之,我只是想弄清楚如何从这部分行为不同的电子邮件中正确提取附件信息。
这些是 mime 编码的文件名。
=?utf-8?B?
这意味着它是 UTF-8、Base64 编码的字符串。
我有一个 PHP 脚本,用于检查电子邮件帐户中是否有新邮件,并尝试从每封电子邮件中下载 .zip 和 .pdf 附件。我正在使用以下代码来执行此操作:
/* try to connect */
$inbox = imap_open($hostname, $username, $password) or die ('Cannot connect to domain:' . imap_last_error());
/* grab emails */
$emails = imap_search($inbox, 'ALL');
/* put the newest emails on top */
rsort($emails);
foreach ($emails as $email_number) {
$overview = imap_fetch_overview($inbox, $email_number, 0);
if ($overview [0]->seen) {
continue;
}
$structure = imap_fetchstructure($inbox, $email_number);
if (!property_exists($structure, 'parts')) {
continue;
}
//print_r($structure->parts);
//get attachments
}
对于大多数电子邮件,$structure->parts
看起来像这样:
[1] => stdClass Object
(
[type] => 3
[encoding] => 3
[ifsubtype] => 1
[subtype] => PDF
[ifdescription] => 0
[ifid] => 0
[bytes] => 132780
[ifdisposition] => 1
[disposition] => attachment
[ifdparameters] => 1
[dparameters] => Array
(
[0] => stdClass Object
(
[attribute] => filename
[value] => some_filename.pdf
)
)
[ifparameters] => 1
[parameters] => Array
(
[0] => stdClass Object
(
[attribute] => name
[value] => some_filename.pdf
)
)
)
[2] => stdClass Object
(
[type] => 3
[encoding] => 3
[ifsubtype] => 1
[subtype] => ZIP
[ifdescription] => 0
[ifid] => 0
[bytes] => 43170
[ifdisposition] => 1
[disposition] => attachment
[ifdparameters] => 1
[dparameters] => Array
(
[0] => stdClass Object
(
[attribute] => filename
[value] => another_filename.zip
)
)
[ifparameters] => 1
[parameters] => Array
(
[0] => stdClass Object
(
[attribute] => name
[value] => another_filename.zip
)
)
)
如您所见,很容易找出每个附件的扩展名和文件名。但是,最近我收到一些电子邮件,其中 $structure->parts 看起来像这样:
[1] => stdClass Object
(
[type] => 3
[encoding] => 3
[ifsubtype] => 1
[subtype] => OCTET-STREAM
[ifdescription] => 1
[description] => =?utf-8?B?Q1RHIFF1ZXLDqXRhcm8gIC0gIC0gMTJfOF8xNi5wZGY=?=
[ifid] => 1
[id] => <A86144A0CA6656448CFEA6FAA316C4C3@blkah.com>
[bytes] => 44592
[ifdisposition] => 1
[disposition] => attachment
[ifdparameters] => 1
[dparameters] => Array
(
[0] => stdClass Object
(
[attribute] => filename
[value] => =?utf-8?B?Q1RHIFF1ZXLDqXRhcm8gIC0gIC0gMTJfOF8xNi5wZGY=?=
)
[1] => stdClass Object
(
[attribute] => size
[value] => 32586
)
[2] => stdClass Object
(
[attribute] => creation-date
[value] => Thu, 08 Dec 2016 22:16:31 GMT
)
[3] => stdClass Object
(
[attribute] => modification-date
[value] => Thu, 08 Dec 2016 22:16:31 GMT
)
)
[ifparameters] => 1
[parameters] => Array
(
[0] => stdClass Object
(
[attribute] => name
[value] => =?utf-8?B?Q1RHIFF1ZXLDqXRhcm8gIC0gIC0gMTJfOF8xNi5wZGY=?=
)
)
)
[2] => stdClass Object
(
[type] => 3
[encoding] => 3
[ifsubtype] => 1
[subtype] => OCTET-STREAM
[ifdescription] => 1
[description] => =?utf-8?B?Q1RHIFF1ZXLDqXRhcm8gIC0gIC0gMTJfOF8xNi56aXA=?=
[ifid] => 1
[id] => <A070F623163C374D9ED5236DBCD3CA3C@blah.com>
[bytes] => 10966
[ifdisposition] => 1
[disposition] => attachment
[ifdparameters] => 1
[dparameters] => Array
(
[0] => stdClass Object
(
[attribute] => filename
[value] => =?utf-8?B?Q1RHIFF1ZXLDqXRhcm8gIC0gIC0gMTJfOF8xNi56aXA=?=
)
[1] => stdClass Object
(
[attribute] => size
[value] => 8011
)
[2] => stdClass Object
(
[attribute] => creation-date
[value] => Thu, 08 Dec 2016 22:16:31 GMT
)
[3] => stdClass Object
(
[attribute] => modification-date
[value] => Thu, 08 Dec 2016 22:16:31 GMT
)
)
[ifparameters] => 1
[parameters] => Array
(
[0] => stdClass Object
(
[attribute] => name
[value] => =?utf-8?B?Q1RHIFF1ZXLDqXRhcm8gIC0gIC0gMTJfOF8xNi56aXA=?=
)
)
)
这些附件也是 PDF 和 ZIP 格式,当使用电子邮件客户端时,它们看起来与任何其他电子邮件中的附件一样。但是正如您在上面看到的,文件名不是 blahblah.zip 和 blahblah.pdf,而是显示类似 "=?utf-8?B?Q1RHIFF1ZXLDqXRhcm8gIC0gIC0gMTJfOF8xNi56aXA=?=" 的内容。此外,两者的子类型都是 'OCTET-STREAM' 而不是 'zip' 或 'pdf'。所以我不知道每种类型的附件是什么,也无法对电子邮件进行任何操作。
如有任何帮助,我们将不胜感激。总而言之,我只是想弄清楚如何从这部分行为不同的电子邮件中正确提取附件信息。
这些是 mime 编码的文件名。
=?utf-8?B?
这意味着它是 UTF-8、Base64 编码的字符串。