解析附件名称的 IMAP 电子邮件 BODYSTRUCTURE

Question

我写了一个 Python 脚本来通过 IMAP（使用 Python 的 imaplib）访问、管理和过滤我的电子邮件。

为了获取电子邮件的附件列表（无需先下载整封电子邮件），我使用电子邮件的 UID 获取了电子邮件的 body 结构，即：

imap4.uid('FETCH', emailUID, '(BODYSTRUCTURE)')

并从那里检索附件名称。

通常，包含附件名称的 "portion" 看起来像：

("attachment" ("filename" "This is the first attachment.zip"))

但是有几次，我遇到了类似的事情：

("attachment" ("filename" {34}', 'This is the second attachment.docx'))

我在某处读到，有时 IMAP 不使用双引号表示字符串，而是使用大括号，字符串长度后跟实际字符串（不带引号）。

例如

{16}This is a string

但是上面的字符串似乎并没有严格遵守这一点（在右大括号后有一个 single-quote，一个逗号和一个 space，并且字符串本身被包裹在single-quotes).

当我下载整封电子邮件时，包含该附件的邮件部分的 header 看起来很正常：

Content-Type: application/docx
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="This is the second attachment.docx"

我如何解释（呃...解析）"abnormal" body 结构，理解额外的 single-quote、逗号等...

那是"standard"吗？

Answer 1

您看到的是一个损坏的文字，可能是被切割和浪费损坏的？文字看起来像

{5}
Hello

也就是长度，然后一个CRLF，然后那么多字节（不是字符）：

{4}

Answer 2

看起来像 IMAP-Tools，一个 GitHub 项目，包括一个车身结构解析器。

Parsing IMAP Email BODYSTRUCTURE for Attachment Names