如何从 email.parser.Parser 返回的消息 object 中获取消息 body(或正文)?
How to get message body (or bodies) from Message object returned by email.parser.Parser?
我正在阅读 Python 3 docs here,我一定是瞎了或什么的...它在哪里说如何获取消息的 body?
我想要做的是打开一条消息并在消息的 text-based 正文中执行一些循环,跳过二进制附件。伪代码:
def read_all_bodies(local_email_file):
email = Parser().parse(open(local_email_file, 'r'))
for pseudo_body in email.pseudo_bodies:
if pseudo_body.pseudo_is_binary():
continue
# Pseudo-parse the body here
我该怎么做?甚至 Message class 是否正确 class?不是只有 headers 吗?
最好使用两个函数来完成:
- 一个打开文件。如果消息是单部分的,消息中的
get_payload
returns 字符串。如果消息是多部分的,它 returns 子消息列表
- 第二个处理text/payload
这是可以做到的:
def parse_file_bodies(filename):
# Opens file and parses email
email = Parser().parse(open(filename, 'r'))
# For multipart emails, all bodies will be handled in a loop
if email.is_multipart():
for msg in email.get_payload():
parse_single_body(msg)
else:
# Single part message is passed diractly
parse_single_body(email)
def parse_single_body(email):
payload = email.get_payload(decode=True)
# The payload is binary. It must be converted to
# python string depending in input charset
# Input charset may vary, based on message
try:
text = payload.decode("utf-8")
# Now you can work with text as with any other string:
...
except UnicodeDecodeError:
print("Error: cannot parse message as UTF-8")
return
我正在阅读 Python 3 docs here,我一定是瞎了或什么的...它在哪里说如何获取消息的 body?
我想要做的是打开一条消息并在消息的 text-based 正文中执行一些循环,跳过二进制附件。伪代码:
def read_all_bodies(local_email_file):
email = Parser().parse(open(local_email_file, 'r'))
for pseudo_body in email.pseudo_bodies:
if pseudo_body.pseudo_is_binary():
continue
# Pseudo-parse the body here
我该怎么做?甚至 Message class 是否正确 class?不是只有 headers 吗?
最好使用两个函数来完成:
- 一个打开文件。如果消息是单部分的,消息中的
get_payload
returns 字符串。如果消息是多部分的,它 returns 子消息列表 - 第二个处理text/payload
这是可以做到的:
def parse_file_bodies(filename):
# Opens file and parses email
email = Parser().parse(open(filename, 'r'))
# For multipart emails, all bodies will be handled in a loop
if email.is_multipart():
for msg in email.get_payload():
parse_single_body(msg)
else:
# Single part message is passed diractly
parse_single_body(email)
def parse_single_body(email):
payload = email.get_payload(decode=True)
# The payload is binary. It must be converted to
# python string depending in input charset
# Input charset may vary, based on message
try:
text = payload.decode("utf-8")
# Now you can work with text as with any other string:
...
except UnicodeDecodeError:
print("Error: cannot parse message as UTF-8")
return