在 Python 中转换字符串内的 ASCII 编码字符
Converting ASCII encoded characters inside string in Python
我正在使用 IMapLib 库从我的邮件服务器读取电子邮件。电子邮件包含我的程序应该解释的 JSON 编码消息。
邮寄代码:
tmp, data = imap.search(None, "UNSEEN")
emails = []
for num in data[0].split():
tmp, data = imap.fetch(num, "(BODY[TEXT])")
# Only append the email body
emails.append(str(data[0][1]))
然而,我从 imaplib 获得的字符串包含一些特殊字符。我发现 =xx 看起来像 'special' 字符的 ASCII 编码版本。我如何将包含此类字符的字符串转换为 'regular' Python 字符串,或者我是否可能在 imaplib 代码中遗漏了一个错误编码字符串的选项?
我得到的示例字符串:
b'This is a message in Mime Format. If you see this, your mail reader does not support this format.\r\n\r\n--=_8e336d0902b13eaec4e7906847c21a6d\r\nContent-Type: text/plain; charset=UTF-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n=0A=0A=0A=0A =0A =0A =0A =0A =0A =0A JSON{"arrival":"03.03.21","departure":"07.03.21","email":"test=\r\n=2Etest@gmail.com","apartment":"app","ov=\r\nerride":0}JSON =0A =0A=0A\r\n--=_8e336d0902b13eaec4e7906847c21a6d\r\nContent-Type: text/html; charset=UTF-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n=0A=0A=0A=0A =0A <meta charset=3D"utf-8"=20=\r\n/>=0A <meta http-equiv=3D"Content-Type" content=3D"text/html charset=\r\n=3DUTF-8" />=0A =0A =0A
=0A JSON{"arrival":"03.03.21","departure":"07.03.21","email":"test=\r\n=2Etest@gmail.com","apartment":"app","ov=\r\nerride":0}JSON
=0A =0A=0A\r\n--=_8e336d0902b13eaec4e7906847c21a6d--\r\n'
我最初只是删除所有 '\n'、'\r' 和 '=' 但今天我收到这个 email/string 并且我的代码错误地解释了“test=\r\n=2Etest@ gmail.com”作为“test2Etest@gmail.com”而不是“test.test@gmail.com”
您的消息中有与编码相关的提示,即:
内容传输编码:引用打印
这解释了您文本中的 =
s。您可以使用 quopri 内置模块来处理它,方法如下:
import quopri
message = b'test=\r\n=2Etest@gmail.com'
decoded = quopri.decodestring(message)
print(decoded)
输出:
b'test.test@gmail.com'
注意quopri.decodestring
return bytes
,所以如果你必须有文字,你就必须做出正确的.decode
,如果使用utf-8
将是:
decoded = quopri.decodestring(message).decode('utf-8')
您正在处理名为“quoted printable”的编码方案(RFC 2045 第 6.7 节中有更多详细信息)。
您至少有两个选择:
- 您可以使用 Python 模块
quopri
- 您可以使用 Python
email
模块 (email.parser
) 的解析器解析您的电子邮件。
但如果您的目标是轻松获取电子邮件内容,使用模块 imap_tools
或 IMAPClient
.
会更容易
他们文档中的一些示例代码:
imap_tools (https://pypi.org/project/imap-tools/):
from imap_tools import MailBox, AND
# get list of email subjects from INBOX folder
with MailBox('imap.mail.com').login('test@mail.com', 'pwd') as mailbox:
subjects = [msg.subject for msg in mailbox.fetch()]
# get list of email subjects from INBOX folder - equivalent verbose version
mailbox = MailBox('imap.mail.com')
mailbox.login('test@mail.com', 'pwd', initial_folder='INBOX') # or mailbox.folder.set instead 3d arg
subjects = [msg.subject for msg in mailbox.fetch(AND(all=True))]
mailbox.logout()
IMAP 客户端 (https://imapclient.readthedocs.io/en/2.1.0/):
from imapclient import IMAPClient
server = IMAPClient('imap.mailserver.com', use_uid=True)
server.login('someuser', 'somepassword')
select_info = server.select_folder('INBOX')
print('%d messages in INBOX' % select_info[b'EXISTS'])
#34 messages in INBOX
messages = server.search(['FROM', 'best-friend@domain.com'])
print("%d messages from our best friend" % len(messages))
#5 messages from our best friend
for msgid, data in server.fetch(messages, ['ENVELOPE']).items():
envelope = data[b'ENVELOPE']
我正在使用 IMapLib 库从我的邮件服务器读取电子邮件。电子邮件包含我的程序应该解释的 JSON 编码消息。
邮寄代码:
tmp, data = imap.search(None, "UNSEEN")
emails = []
for num in data[0].split():
tmp, data = imap.fetch(num, "(BODY[TEXT])")
# Only append the email body
emails.append(str(data[0][1]))
然而,我从 imaplib 获得的字符串包含一些特殊字符。我发现 =xx 看起来像 'special' 字符的 ASCII 编码版本。我如何将包含此类字符的字符串转换为 'regular' Python 字符串,或者我是否可能在 imaplib 代码中遗漏了一个错误编码字符串的选项?
我得到的示例字符串:
b'This is a message in Mime Format. If you see this, your mail reader does not support this format.\r\n\r\n--=_8e336d0902b13eaec4e7906847c21a6d\r\nContent-Type: text/plain; charset=UTF-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n=0A=0A=0A=0A =0A =0A =0A =0A =0A =0A JSON{"arrival":"03.03.21","departure":"07.03.21","email":"test=\r\n=2Etest@gmail.com","apartment":"app","ov=\r\nerride":0}JSON =0A =0A=0A\r\n--=_8e336d0902b13eaec4e7906847c21a6d\r\nContent-Type: text/html; charset=UTF-8\r\nContent-Transfer-Encoding: quoted-printable\r\n\r\n=0A=0A=0A=0A =0A <meta charset=3D"utf-8"=20=\r\n/>=0A <meta http-equiv=3D"Content-Type" content=3D"text/html charset=\r\n=3DUTF-8" />=0A =0A =0A
=0A JSON{"arrival":"03.03.21","departure":"07.03.21","email":"test=\r\n=2Etest@gmail.com","apartment":"app","ov=\r\nerride":0}JSON
=0A =0A=0A\r\n--=_8e336d0902b13eaec4e7906847c21a6d--\r\n'
我最初只是删除所有 '\n'、'\r' 和 '=' 但今天我收到这个 email/string 并且我的代码错误地解释了“test=\r\n=2Etest@ gmail.com”作为“test2Etest@gmail.com”而不是“test.test@gmail.com”
您的消息中有与编码相关的提示,即:
内容传输编码:引用打印
这解释了您文本中的 =
s。您可以使用 quopri 内置模块来处理它,方法如下:
import quopri
message = b'test=\r\n=2Etest@gmail.com'
decoded = quopri.decodestring(message)
print(decoded)
输出:
b'test.test@gmail.com'
注意quopri.decodestring
return bytes
,所以如果你必须有文字,你就必须做出正确的.decode
,如果使用utf-8
将是:
decoded = quopri.decodestring(message).decode('utf-8')
您正在处理名为“quoted printable”的编码方案(RFC 2045 第 6.7 节中有更多详细信息)。
您至少有两个选择:
- 您可以使用 Python 模块
quopri
- 您可以使用 Python
email
模块 (email.parser
) 的解析器解析您的电子邮件。
但如果您的目标是轻松获取电子邮件内容,使用模块 imap_tools
或 IMAPClient
.
他们文档中的一些示例代码:
imap_tools (https://pypi.org/project/imap-tools/):
from imap_tools import MailBox, AND
# get list of email subjects from INBOX folder
with MailBox('imap.mail.com').login('test@mail.com', 'pwd') as mailbox:
subjects = [msg.subject for msg in mailbox.fetch()]
# get list of email subjects from INBOX folder - equivalent verbose version
mailbox = MailBox('imap.mail.com')
mailbox.login('test@mail.com', 'pwd', initial_folder='INBOX') # or mailbox.folder.set instead 3d arg
subjects = [msg.subject for msg in mailbox.fetch(AND(all=True))]
mailbox.logout()
IMAP 客户端 (https://imapclient.readthedocs.io/en/2.1.0/):
from imapclient import IMAPClient
server = IMAPClient('imap.mailserver.com', use_uid=True)
server.login('someuser', 'somepassword')
select_info = server.select_folder('INBOX')
print('%d messages in INBOX' % select_info[b'EXISTS'])
#34 messages in INBOX
messages = server.search(['FROM', 'best-friend@domain.com'])
print("%d messages from our best friend" % len(messages))
#5 messages from our best friend
for msgid, data in server.fetch(messages, ['ENVELOPE']).items():
envelope = data[b'ENVELOPE']