从 Gmail 解码 MIME 电子邮件 API - \r\n 和 3D - Python

Question

我目前正在使用 Gmail API 阅读 Python 中的一些 HTML 封电子邮件。我已经解码了他们的 body 使用：

base64.urlsafe_b64decode

打印出生成的 HTML 电子邮件后，“\r\n”和“3D”分散在 HTML 周围。我无法删除“\r\n”，因为 \ 和 r 以及 \ 和 n 注册为不同的字符（？）而且我不确定“3D”来自何处。

我的解码方式有问题吗？

代码如下：

results = service.users().messages().list(userId='me', q = 'is: unread').execute()

for index in range(len(results['messages'])):
    message = service.users().messages().get(userId='me', id=results['messages'][index]['id'], format='raw').execute()

    msg_str = base64.urlsafe_b64decode(message['raw'].encode('UTF-8'))

    mime_msg = email.message_from_string(str(msg_str))

    print(mime_msg)

    service.users().messages().modify(userId='me', id=results['messages'][index]['id'], body = {'removeLabelIds': ['UNREAD']}).execute() # mark message as read

Answer 1

我找到了解决方案 - 我停止使用 Python 中的电子邮件库，并将 msg_str 转换为字符串（字节类型）。从那里，我只是从字符串中删除 '\r\n' 并将 '=3D' 替换为 '='.

Answer 2

这不是一个很好的解决方案，而是使用

for email_part in message.walk(): 
    part_data = email_part.get_payload(decode=True)

其中消息是 Python email.message.Message 对象。然后也许使用类似 BeautifulSoup 的东西来有效地分析 HTML。希望对您有所帮助！

Answer 3

如果设置了 str.decode('utf-8')，

maksel 的解决方案对我有用。原始代码编码而不是解码 byte-string.

因此，在python 3.7下我们可以这样替换：

msg = msg.replace('\r\n', '').replace('=3D', '=')

请注意，在我的案例中，此解决方案不适用于所有 html 个标签。

Answer 4

我可能来晚了。提到的一些解决方案有效。但是为了帮助访问这里的其他人，我想 post 这个答案，因为它看起来更干净一些。

构建邮件对象时使用 policy=email.policy.default。这将删除提到的 =3D、\r\n 等

mailobject = email.message_from_string(msg_str,  policy=email.policy.default)

如果在 Python 3.6+ 上，您可以使用 get_body 和 get_content 方法。

if mailobject.is_multipart():
    body = mailobject.get_body(('html',))
else:
    body = mailobject.get_body(('plain',))

if body:
    body = body.get_content()

print(body)

Above codes are very minimal just to suffice the answer. Here we assumed its either just plain or html. Remember to cater for other situations when handling emails.

一个额外的无关提示：

因为这是一个编码问题，所以这个答案也适用于其他类似的情况。就像尝试使用 AWS Lambda 函数 (Python) 解析推送到 s3 转发的 AWS SES 电子邮件时一样。我不得不在这里提到它，因为我在尝试玩这些时遇到了同样的问题。

在这种情况下使用它

s3_file = object_s3['Body'].read()
mailobject = email.message_from_string(s3_file.decode('utf-8'),  policy=email.policy.default)

从 Gmail 解码 MIME 电子邮件 API - \r\n 和 3D - Python

Decoding MIME email from Gmail API - \r\n and 3D - Python

python

email

mime

gmail-api