如何从段落中提取日期
How to extract the date from a paragraph
我有大句如下图,
how are you
On Tue, Dec 21, 2021 at 1:51 PM <abc<http://localhost>> wrote:
-------------------------------------------------------------
NOTE: Please do not remove email address from the \"To\" line of this email when replying. This address is used to capture the email and report it. Please do not remove or change the subject line of this email. The subject line of this email contains information to refer this correspondence back to the originating discrepancy.
我想要句子中指定的日期和时间(Tue, Dec 21, 2021 at 1:51 PM
)。
如何从句子中提取?
这里的方法是使用正则表达式,但为了简单起见,如果文本的格式始终相同,您可以通过查找看起来像这样的行来获取日期字符串 On SOME DATE <Someone<someone's email address>> wrote:
.这是一个示例实现:
email = """how are you
On Tue, Dec 21, 2021 at 1:51 PM <abc<http://localhost>> wrote:
-------------------------------------------------------------
NOTE: Please do not remove email address from the \"To\" line of this email when replying. This address is used to capture the email and report it. Please do not remove or change the subject line of this email. The subject line of this email contains information to refer this correspondence back to the originating discrepancy."""
for line in email.splitlines():
if line.startswith("On ") and line.endswith(" wrote:"):
date_string = line[3 : line.index(" <")]
print(f"Found the date: {date_string!r}")
break
else:
print("Could not find the date.")
很脏:
string = """how are you \r\n\r\nOn Tue, Dec 21, 2021 at 1:51 PM
<abchttp://localhost> wrote:\r\n\r\n\r\n---------------------------------
----------------------------\r\nNOTE: Please do not remove email address
from the"To" line of this email when replying.This address is used to
capture the email and report it.Please do not remove or change the
subject line of this email.The subject line of this email contains
information to refer this correspondence back to the originating
discrepancy.\r\n"""
string = string.split("\r\n\r\n")
date = ' '.join(string[1].split(' ')[:8])
print(date)
- 使用正则表达式提取日期和时间。
import re
text = '''how are you
On Tue, Dec 21, 2021 at 1:51 PM <abc<http://localhost>> wrote:
...
'''
match = re.search('(Mon|Tue|Wed|Thu|Fri|Sat|Sun).*?(AM|PM)', text)
match_date_and_time = match.group() # Tue, Dec 21, 2021 at 1:51 PM
- 使用datetime.strptime解析日期和时间。
import datetime
datetime.strptime(match_date_and_time, '%a, %b %d, %Y at %I:%M %p')
我有大句如下图,
how are you
On Tue, Dec 21, 2021 at 1:51 PM <abc<http://localhost>> wrote:
-------------------------------------------------------------
NOTE: Please do not remove email address from the \"To\" line of this email when replying. This address is used to capture the email and report it. Please do not remove or change the subject line of this email. The subject line of this email contains information to refer this correspondence back to the originating discrepancy.
我想要句子中指定的日期和时间(Tue, Dec 21, 2021 at 1:51 PM
)。
如何从句子中提取?
这里的方法是使用正则表达式,但为了简单起见,如果文本的格式始终相同,您可以通过查找看起来像这样的行来获取日期字符串 On SOME DATE <Someone<someone's email address>> wrote:
.这是一个示例实现:
email = """how are you
On Tue, Dec 21, 2021 at 1:51 PM <abc<http://localhost>> wrote:
-------------------------------------------------------------
NOTE: Please do not remove email address from the \"To\" line of this email when replying. This address is used to capture the email and report it. Please do not remove or change the subject line of this email. The subject line of this email contains information to refer this correspondence back to the originating discrepancy."""
for line in email.splitlines():
if line.startswith("On ") and line.endswith(" wrote:"):
date_string = line[3 : line.index(" <")]
print(f"Found the date: {date_string!r}")
break
else:
print("Could not find the date.")
很脏:
string = """how are you \r\n\r\nOn Tue, Dec 21, 2021 at 1:51 PM
<abchttp://localhost> wrote:\r\n\r\n\r\n---------------------------------
----------------------------\r\nNOTE: Please do not remove email address
from the"To" line of this email when replying.This address is used to
capture the email and report it.Please do not remove or change the
subject line of this email.The subject line of this email contains
information to refer this correspondence back to the originating
discrepancy.\r\n"""
string = string.split("\r\n\r\n")
date = ' '.join(string[1].split(' ')[:8])
print(date)
- 使用正则表达式提取日期和时间。
import re
text = '''how are you
On Tue, Dec 21, 2021 at 1:51 PM <abc<http://localhost>> wrote:
...
'''
match = re.search('(Mon|Tue|Wed|Thu|Fri|Sat|Sun).*?(AM|PM)', text)
match_date_and_time = match.group() # Tue, Dec 21, 2021 at 1:51 PM
- 使用datetime.strptime解析日期和时间。
import datetime
datetime.strptime(match_date_and_time, '%a, %b %d, %Y at %I:%M %p')