使用 exchangelib 获取电子邮件时出现 MemoryError

Question

我有一个关于使用 exchangelib 批量保存电子邮件数据的问题。目前如果有很多电子邮件会花费很多时间。几分钟后它抛出这个错误：

    ERROR:    MemoryError:
    Retry: 0
    Waited: 10
    Timeout: 120
    Session: 25999
    Thread: 28148
    Auth type: <requests.auth.HTTPBasicAuth object at 0x1FBFF1F0>
    URL: https://outlook.office365.com/EWS/Exchange.asmx
    HTTP adapter: <requests.adapters.HTTPAdapter object at 0x1792CE68>
    Allow redirects: False
    Streaming: False
    Response time: 411.93799999996554
    Status code: 503
    Request headers: {'X-AnchorMailbox': 'myworkemail@workdomain.com'}
    Response headers: {}

这是我用来连接和阅读的代码：

def connect_mail():
    config = Configuration(
        server="outlook.office365.com",
        credentials=Credentials(
            username="myworkemail@workdomain.com", password="*******"
        ),
    )
    return Account(
        primary_smtp_address="myworkemail@workdomain.com",
        config=config,
        access_type=DELEGATE,
    )

def import_email(account):
    tz = EWSTimeZone.localzone()
    start = EWSDateTime(2020, 10, 26, 22, 15, tzinfo=tz)
    for item in account.inbox.filter(
        datetime_received__gt=start, is_read=False
    ).order_by("-datetime_received"):
        email_body = item.body
        email_subject = item.subject
        soup = bs(email_body, "html.parser")
        tables = soup.find_all("table")
        item.is_read = True
        item.save()
        # Some code here for saving the email to a database

Answer 1

您得到 MemoryError，这意味着 Python 无法在您的计算机上分配更多内存。

您可以采取一些措施来减少脚本的内存消耗。一种是使用 .iterator() which disables internal caching of your query results. Another is to fetch only the fields you actually need using .only()

当您使用 .only() 时，其他字段将为 None。您需要记住只保存您实际更改的一个字段：item.save(update_fields=['is_read'])

以下是如何使用这两项改进的示例：

for item in account.inbox.filter(
        datetime_received__gt=start, is_read=False,
    ).only(
        'is_read', 'subject', 'body',
    ).order_by('-datetime_received').iterator():

使用 exchangelib 获取电子邮件时出现 MemoryError

MemoryError when fetching email with exchangelib

python-3.x

exchangelib