使用 exchangelib 获取电子邮件时出现 MemoryError
MemoryError when fetching email with exchangelib
我有一个关于使用 exchangelib 批量保存电子邮件数据的问题。目前如果有很多电子邮件会花费很多时间。几分钟后它抛出这个错误:
ERROR: MemoryError:
Retry: 0
Waited: 10
Timeout: 120
Session: 25999
Thread: 28148
Auth type: <requests.auth.HTTPBasicAuth object at 0x1FBFF1F0>
URL: https://outlook.office365.com/EWS/Exchange.asmx
HTTP adapter: <requests.adapters.HTTPAdapter object at 0x1792CE68>
Allow redirects: False
Streaming: False
Response time: 411.93799999996554
Status code: 503
Request headers: {'X-AnchorMailbox': 'myworkemail@workdomain.com'}
Response headers: {}
这是我用来连接和阅读的代码:
def connect_mail():
config = Configuration(
server="outlook.office365.com",
credentials=Credentials(
username="myworkemail@workdomain.com", password="*******"
),
)
return Account(
primary_smtp_address="myworkemail@workdomain.com",
config=config,
access_type=DELEGATE,
)
def import_email(account):
tz = EWSTimeZone.localzone()
start = EWSDateTime(2020, 10, 26, 22, 15, tzinfo=tz)
for item in account.inbox.filter(
datetime_received__gt=start, is_read=False
).order_by("-datetime_received"):
email_body = item.body
email_subject = item.subject
soup = bs(email_body, "html.parser")
tables = soup.find_all("table")
item.is_read = True
item.save()
# Some code here for saving the email to a database
您得到 MemoryError
,这意味着 Python 无法在您的计算机上分配更多内存。
您可以采取一些措施来减少脚本的内存消耗。一种是使用 .iterator() which disables internal caching of your query results. Another is to fetch only the fields you actually need using .only()
当您使用 .only()
时,其他字段将为 None
。您需要记住只保存您实际更改的一个字段:item.save(update_fields=['is_read'])
以下是如何使用这两项改进的示例:
for item in account.inbox.filter(
datetime_received__gt=start, is_read=False,
).only(
'is_read', 'subject', 'body',
).order_by('-datetime_received').iterator():
我有一个关于使用 exchangelib 批量保存电子邮件数据的问题。目前如果有很多电子邮件会花费很多时间。几分钟后它抛出这个错误:
ERROR: MemoryError:
Retry: 0
Waited: 10
Timeout: 120
Session: 25999
Thread: 28148
Auth type: <requests.auth.HTTPBasicAuth object at 0x1FBFF1F0>
URL: https://outlook.office365.com/EWS/Exchange.asmx
HTTP adapter: <requests.adapters.HTTPAdapter object at 0x1792CE68>
Allow redirects: False
Streaming: False
Response time: 411.93799999996554
Status code: 503
Request headers: {'X-AnchorMailbox': 'myworkemail@workdomain.com'}
Response headers: {}
这是我用来连接和阅读的代码:
def connect_mail():
config = Configuration(
server="outlook.office365.com",
credentials=Credentials(
username="myworkemail@workdomain.com", password="*******"
),
)
return Account(
primary_smtp_address="myworkemail@workdomain.com",
config=config,
access_type=DELEGATE,
)
def import_email(account):
tz = EWSTimeZone.localzone()
start = EWSDateTime(2020, 10, 26, 22, 15, tzinfo=tz)
for item in account.inbox.filter(
datetime_received__gt=start, is_read=False
).order_by("-datetime_received"):
email_body = item.body
email_subject = item.subject
soup = bs(email_body, "html.parser")
tables = soup.find_all("table")
item.is_read = True
item.save()
# Some code here for saving the email to a database
您得到 MemoryError
,这意味着 Python 无法在您的计算机上分配更多内存。
您可以采取一些措施来减少脚本的内存消耗。一种是使用 .iterator() which disables internal caching of your query results. Another is to fetch only the fields you actually need using .only()
当您使用 .only()
时,其他字段将为 None
。您需要记住只保存您实际更改的一个字段:item.save(update_fields=['is_read'])
以下是如何使用这两项改进的示例:
for item in account.inbox.filter(
datetime_received__gt=start, is_read=False,
).only(
'is_read', 'subject', 'body',
).order_by('-datetime_received').iterator():