从 python 中的 .eml 文件解析 excel 附件
Parse excel attachment from .eml file in python
我正在尝试解析 .eml 文件。 .eml 有一个 excel 附件,目前是 base 64 编码的。我正在尝试弄清楚如何将其解码为 XML,以便我稍后可以将其转换为我可以使用的 CSV。
这是我现在的代码:
import email
data = file('Openworkorders.eml').read()
msg = email.message_from_string(data)
for part in msg.walk():
c_type = part.get_content_type()
c_disp = part.get('Content Disposition')
if part.get_content_type() == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
excelContents = part.get_payload(decode = True)
print excelContents
问题是
当我尝试对其进行解码时,它会吐回类似这样的内容。
我用这个 post 来帮助我编写上面的代码。
How can I get an email message's text content using Python?
更新:
这与我的文件完全遵循 post 的解决方案,但 part.get_payload()
returns 所有内容仍然编码。我还没有弄清楚如何以这种方式访问解码后的内容。
import email
data = file('Openworkorders.eml').read()
msg = email.message_from_string(data)
for part in msg.walk():
if part.get_content_type() == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
name = part.get_param('name') or 'MyDoc.doc'
f = open(name, 'wb')
f.write(part.get_payload(None, True))
f.close()
print part.get("content-transfer-encoding")
从 this table (and as you have already concluded), this file is an .xlsx
. You can't just decode it with unicode
or base64
: you need a special package. Excel files specifically are a bit tricker (for e.g. this one does PowerPoint and Word, but not Excel). There are a few online, see here 可以清楚地看出 - xlrd
可能是最好的。
这是我的解决方案:
我发现了两件事:
1.) 我认为 .open() 进入 .eml 并更改选定的解码元素。我认为我需要在继续之前查看解码数据。 .open() 真正发生的是它在该 .xlsx 文件的同一目录中创建一个新文件。您必须打开附件才能处理数据。
2.) 您必须使用文件路径打开 xlrd 工作簿。
import email
import xlrd
data = file('EmailFileName.eml').read()
msg = email.message_from_string(data) # entire message
if msg.is_multipart():
for payload in msg.get_payload():
bdy = payload.get_payload()
else:
bdy = msg.get_payload()
attachment = msg.get_payload()[1]
# open and save excel file to disk
f = open('excelFile.xlsx', 'wb')
f.write(attachment.get_payload(decode=True))
f.close()
xls = xlrd.open_workbook(excelFilePath) # so something in quotes like '/Users/mymac/thisProjectsFolder/excelFileName.xlsx'
# Here's a bonus for how to start accessing excel cells and rows
for sheets in xls.sheets():
list = []
for rows in range(sheets.nrows):
for col in range(sheets.ncols):
list.append(str(sheets.cell(rows, col).value))
我正在尝试解析 .eml 文件。 .eml 有一个 excel 附件,目前是 base 64 编码的。我正在尝试弄清楚如何将其解码为 XML,以便我稍后可以将其转换为我可以使用的 CSV。
这是我现在的代码:
import email
data = file('Openworkorders.eml').read()
msg = email.message_from_string(data)
for part in msg.walk():
c_type = part.get_content_type()
c_disp = part.get('Content Disposition')
if part.get_content_type() == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
excelContents = part.get_payload(decode = True)
print excelContents
问题是
当我尝试对其进行解码时,它会吐回类似这样的内容。
我用这个 post 来帮助我编写上面的代码。
How can I get an email message's text content using Python?
更新:
这与我的文件完全遵循 post 的解决方案,但 part.get_payload()
returns 所有内容仍然编码。我还没有弄清楚如何以这种方式访问解码后的内容。
import email
data = file('Openworkorders.eml').read()
msg = email.message_from_string(data)
for part in msg.walk():
if part.get_content_type() == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
name = part.get_param('name') or 'MyDoc.doc'
f = open(name, 'wb')
f.write(part.get_payload(None, True))
f.close()
print part.get("content-transfer-encoding")
从 this table (and as you have already concluded), this file is an .xlsx
. You can't just decode it with unicode
or base64
: you need a special package. Excel files specifically are a bit tricker (for e.g. this one does PowerPoint and Word, but not Excel). There are a few online, see here 可以清楚地看出 - xlrd
可能是最好的。
这是我的解决方案:
我发现了两件事:
1.) 我认为 .open() 进入 .eml 并更改选定的解码元素。我认为我需要在继续之前查看解码数据。 .open() 真正发生的是它在该 .xlsx 文件的同一目录中创建一个新文件。您必须打开附件才能处理数据。 2.) 您必须使用文件路径打开 xlrd 工作簿。
import email
import xlrd
data = file('EmailFileName.eml').read()
msg = email.message_from_string(data) # entire message
if msg.is_multipart():
for payload in msg.get_payload():
bdy = payload.get_payload()
else:
bdy = msg.get_payload()
attachment = msg.get_payload()[1]
# open and save excel file to disk
f = open('excelFile.xlsx', 'wb')
f.write(attachment.get_payload(decode=True))
f.close()
xls = xlrd.open_workbook(excelFilePath) # so something in quotes like '/Users/mymac/thisProjectsFolder/excelFileName.xlsx'
# Here's a bonus for how to start accessing excel cells and rows
for sheets in xls.sheets():
list = []
for rows in range(sheets.nrows):
for col in range(sheets.ncols):
list.append(str(sheets.cell(rows, col).value))