如何使用正则表达式从电子邮件正文中删除某个关键字后的所有内容?

How to remove everything after a certain keyword from email body using regex?

我写了下面的代码来获取特定的值。它们包括日期、索引值:SGEPSBSH 和来自特定电子邮件的 bbg 级别。

我正在尝试将其保存到 pandas 数据框。在将整个电子邮件正文保存到数据框之前,我试图从关键字“问候”开始删除客户签名后的所有内容。

我收到以下错误:

File "snapper.py", line 39, in <module>
    Body_content = message.body
File "__init__.py", line 473, in __getattr__
    raise AttributeError("'%s' object has no attribute '%s'" % (repr(self), attr))
AttributeError: '<Library._MailItem instance at 0x2473706520480>' 
                object has no attribute 'body'

你能帮忙修复我的代码吗?

import win32com.client
import re
import os
import pandas
import datetime
from datetime import date

EMAIL_ACCOUNT = 'atul.sanwal@ihsmarkit.com'
EMAIL_SUBJ_SEARCH_STRING = 'SGEPSBSH Index Level'
EMAIL_CONTNT = {'Ticker': [], 'TickerLevel': [], 'DATE': []}
out_app = win32com.client.gencache.EnsureDispatch("Outlook.Application")
out_namespace = out_app.GetNamespace("MAPI")

root_folder = out_namespace.GetDefaultFolder(6)
out_iter_folder = root_folder.Folders['Email_snapper']
char_length_of_search_substring = len(EMAIL_SUBJ_SEARCH_STRING)
item_count = out_iter_folder.Items.Count
Flag = False
cnt = 0
if out_iter_folder.Items.Count > 0:
    for i in range(item_count, 0, -1)[:2]:
        message = out_iter_folder.Items[i]
        #message = message.Restrict("[ReceivedTime] >= '" + lastWeekDateTime + "'")
Body_content = message.body
message.body = re.sub(r".*Regards[^\n]+\n[^\n]+", "",message.body)
print(Body_content)

如果您不习惯使用正则表达式,简单的字符串切片可能也适合您

s = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus tincidunt elit in ex " \
    "molestie euismod sed et velit. Aenean blandit placerat sodales. Curabitur mattis nibh nec " \
    "leo hendrerit commodo. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras eu " \
    "mattis dui, at convallis dolor."
s = s[:s.find("amet")].strip()
print(s)

输出:

Lorem ipsum dolor sit