从 python 中的 .eml 文件解析 excel 附件

Parse excel attachment from .eml file in python

我正在尝试解析 .eml 文件。 .eml 有一个 excel 附件,目前是 base 64 编码的。我正在尝试弄清楚如何将其解码为 XML,以便我稍后可以将其转换为我可以使用的 CSV。

这是我现在的代码:

import email

data = file('Openworkorders.eml').read()
msg = email.message_from_string(data)

for part in msg.walk():
    c_type = part.get_content_type()
    c_disp = part.get('Content Disposition')


    if part.get_content_type() == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
        excelContents = part.get_payload(decode = True)

        print excelContents

问题是

当我尝试对其进行解码时,它会吐回类似这样的内容。

我用这个 post 来帮助我编写上面的代码。

How can I get an email message's text content using Python?

更新:

这与我的文件完全遵循 post 的解决方案,但 part.get_payload() returns 所有内容仍然编码。我还没有弄清楚如何以这种方式访问​​解码后的内容。

import email


data = file('Openworkorders.eml').read()
msg = email.message_from_string(data)
for part in msg.walk():
    if part.get_content_type() == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet':
        name = part.get_param('name') or 'MyDoc.doc'
        f = open(name, 'wb')
        f.write(part.get_payload(None, True)) 
        f.close()

        print part.get("content-transfer-encoding")

this table (and as you have already concluded), this file is an .xlsx. You can't just decode it with unicode or base64: you need a special package. Excel files specifically are a bit tricker (for e.g. this one does PowerPoint and Word, but not Excel). There are a few online, see here 可以清楚地看出 - xlrd 可能是最好的。

这是我的解决方案:

我发现了两件事:

1.) 我认为 .open() 进入 .eml 并更改选定的解码元素。我认为我需要在继续之前查看解码数据。 .open() 真正发生的是它在该 .xlsx 文件的同一目录中创建一个新文件。您必须打开附件才能处理数据。 2.) 您必须使用文件路径打开 xlrd 工作簿。

import email
import xlrd 

data = file('EmailFileName.eml').read()
    msg = email.message_from_string(data)  # entire message

    if msg.is_multipart():
        for payload in msg.get_payload():
            bdy = payload.get_payload()
    else:
        bdy = msg.get_payload()

    attachment = msg.get_payload()[1]


    # open and save excel file to disk
    f = open('excelFile.xlsx', 'wb')
    f.write(attachment.get_payload(decode=True))
    f.close()

    xls = xlrd.open_workbook(excelFilePath) # so something in quotes like '/Users/mymac/thisProjectsFolder/excelFileName.xlsx'

    # Here's a bonus for how to start accessing excel cells and rows
    for sheets in xls.sheets():
        list = []
        for rows in range(sheets.nrows):
            for col in range(sheets.ncols):
                list.append(str(sheets.cell(rows, col).value))