使用 exchangelib 获取电子邮件时出现 MemoryError

MemoryError when fetching email with exchangelib

我有一个关于使用 exchangelib 批量保存电子邮件数据的问题。目前如果有很多电子邮件会花费很多时间。几分钟后它抛出这个错误:

    ERROR:    MemoryError:
    Retry: 0
    Waited: 10
    Timeout: 120
    Session: 25999
    Thread: 28148
    Auth type: <requests.auth.HTTPBasicAuth object at 0x1FBFF1F0>
    URL: https://outlook.office365.com/EWS/Exchange.asmx
    HTTP adapter: <requests.adapters.HTTPAdapter object at 0x1792CE68>
    Allow redirects: False
    Streaming: False
    Response time: 411.93799999996554
    Status code: 503
    Request headers: {'X-AnchorMailbox': 'myworkemail@workdomain.com'}
    Response headers: {}

这是我用来连接和阅读的代码:

def connect_mail():
    config = Configuration(
        server="outlook.office365.com",
        credentials=Credentials(
            username="myworkemail@workdomain.com", password="*******"
        ),
    )
    return Account(
        primary_smtp_address="myworkemail@workdomain.com",
        config=config,
        access_type=DELEGATE,
    )

def import_email(account):
    tz = EWSTimeZone.localzone()
    start = EWSDateTime(2020, 10, 26, 22, 15, tzinfo=tz)
    for item in account.inbox.filter(
        datetime_received__gt=start, is_read=False
    ).order_by("-datetime_received"):
        email_body = item.body
        email_subject = item.subject
        soup = bs(email_body, "html.parser")
        tables = soup.find_all("table")
        item.is_read = True
        item.save()
        # Some code here for saving the email to a database

您得到 MemoryError,这意味着 Python 无法在您的计算机上分配更多内存。

您可以采取一些措施来减少脚本的内存消耗。一种是使用 .iterator() which disables internal caching of your query results. Another is to fetch only the fields you actually need using .only()

当您使用 .only() 时,其他字段将为 None。您需要记住只保存您实际更改的一个字段:item.save(update_fields=['is_read'])

以下是如何使用这两项改进的示例:

for item in account.inbox.filter(
        datetime_received__gt=start, is_read=False,
    ).only(
        'is_read', 'subject', 'body',
    ).order_by('-datetime_received').iterator():