在 Python 中转换字符串内的 ASCII 编码字符

Converting ASCII encoded characters inside string in Python

我正在使用 IMapLib 库从我的邮件服务器读取电子邮件。电子邮件包含我的程序应该解释的 JSON 编码消息。

邮寄代码:

tmp, data = imap.search(None, "UNSEEN")
emails = []

for num in data[0].split():
    tmp, data = imap.fetch(num, "(BODY[TEXT])")
    # Only append the email body
    emails.append(str(data[0][1]))

然而,我从 imaplib 获得的字符串包含一些特殊字符。我发现 =xx 看起来像 'special' 字符的 ASCII 编码版本。我如何将包含此类字符的字符串转换为 'regular' Python 字符串,或者我是否可能在 imaplib 代码中遗漏了一个错误编码字符串的选项?

我得到的示例字符串:

b'This is a message in Mime Format. If you see this, your mail reader does not support this format.\r\n\r\n--=_8e336d0902b13eaec4e7906847c21a6d\r\nContent-Type: text/plain; charset=UTF-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n=0A=0A=0A=0A =0A =0A =0A =0A =0A =0A JSON{"arrival":"03.03.21","departure":"07.03.21","email":"test=\r\n=2Etest@gmail.com","apartment":"app","ov=\r\nerride":0}JSON =0A =0A=0A\r\n--=_8e336d0902b13eaec4e7906847c21a6d\r\nContent-Type: text/html; charset=UTF-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n=0A=0A=0A=0A =0A <meta charset=3D"utf-8"=20=\r\n/>=0A <meta http-equiv=3D"Content-Type" content=3D"text/html charset=\r\n=3DUTF-8" />=0A =0A =0A

=0A JSON{"arrival":"03.03.21","departure":"07.03.21","email":"test=\r\n=2Etest@gmail.com","apartment":"app","ov=\r\nerride":0}JSON

=0A =0A=0A\r\n--=_8e336d0902b13eaec4e7906847c21a6d--\r\n'

我最初只是删除所有 '\n'、'\r' 和 '=' 但今天我收到这个 email/string 并且我的代码错误地解释了“test=\r\n=2Etest@ gmail.com”作为“test2Etest@gmail.com”而不是“test.test@gmail.com”

您的消息中有与编码相关的提示,即:

内容传输编码:引用打印

这解释了您文本中的 =s。您可以使用 quopri 内置模块来处理它,方法如下:

import quopri
message = b'test=\r\n=2Etest@gmail.com'
decoded = quopri.decodestring(message)
print(decoded)

输出:

b'test.test@gmail.com'

注意quopri.decodestring return bytes,所以如果你必须有文字,你就必须做出正确的.decode,如果使用utf-8将是:

decoded = quopri.decodestring(message).decode('utf-8')

您正在处理名为“quoted printable”的编码方案(RFC 2045 第 6.7 节中有更多详细信息)。

您至少有两个选择:

  1. 您可以使用 Python 模块 quopri
  2. 您可以使用 Python email 模块 (email.parser) 的解析器解析您的电子邮件。

但如果您的目标是轻松获取电子邮件内容,使用模块 imap_toolsIMAPClient.

会更容易

他们文档中的一些示例代码:

imap_tools (https://pypi.org/project/imap-tools/):

from imap_tools import MailBox, AND

# get list of email subjects from INBOX folder
with MailBox('imap.mail.com').login('test@mail.com', 'pwd') as mailbox:
    subjects = [msg.subject for msg in mailbox.fetch()]

# get list of email subjects from INBOX folder - equivalent verbose version
mailbox = MailBox('imap.mail.com')
mailbox.login('test@mail.com', 'pwd', initial_folder='INBOX')  # or mailbox.folder.set instead 3d arg
subjects = [msg.subject for msg in mailbox.fetch(AND(all=True))]
mailbox.logout()

IMAP 客户端 (https://imapclient.readthedocs.io/en/2.1.0/):

from imapclient import IMAPClient
server = IMAPClient('imap.mailserver.com', use_uid=True)
server.login('someuser', 'somepassword')

select_info = server.select_folder('INBOX')
print('%d messages in INBOX' % select_info[b'EXISTS'])
#34 messages in INBOX

messages = server.search(['FROM', 'best-friend@domain.com'])
print("%d messages from our best friend" % len(messages))
#5 messages from our best friend

for msgid, data in server.fetch(messages, ['ENVELOPE']).items():
    envelope = data[b'ENVELOPE']