在 python 中使用正则表达式拆分列表

Question

我正在尝试使用正则表达式拆分字符串，但我很难有效地使用它

我的输入是一个作为列表的字符串，每一行都是列表的一个元素。

列表中的某些行是日期，格式始终相同，即 Weekday, Month Number, Year。示例可以是 Wednesday, March 16, 2022 或 Friday, April 8, 2022，然后列表中的下一个元素是今天键入的文本，直到列表中的下一个日期。

我给你举了一个列表的例子： ['Wednesday, March 16, 2022', 'bla-bla-bla', 'Hello World !', 'Friday, April 8, 2022', 'Can't wait for the weekend' ,'See you !']

我想输出一个列表列表，列表中的每个元素都是当天每个字符串的列表，日期是该列表的第一个（或最后一个）元素。目前，我试图将字符串放在一个大字符串中并使用正则表达式来拆分字符串（我使用这个正则表达式但我不知道它是否是正确的语法re.split(r'\s+ +\d\d,+ \d\d\d\d", big_string)但我不知道真的相信它，因为我认为将拆分后的文本放入列表中会非常困难。

有人知道如何做到这一点吗？我希望我的解释很清楚。

谢谢大家！

Answer 1

因此您可以首先使用 dateutil 模块解析列表中的日期。之后，我建议用列表中的每一天句子创建一个字典。然后你可以根据需要解析它并创建其他所需的格式。

from dateutil import parser
from datetime import datetime as dt
# parser.parse('Wednesday, March 16, 2022') -> datetime.datetime(2022, 3, 16, 0, 0)

# So looping over your list
# This will transform the date strings to python dates

new_list = []
for el in loopable_list:
    try:
        # Parse and transform into datetime string
        date_el = parser.parse(el)
        new_list.append(date_el)
    except:
        new_list.append(el)

# And now going foward to structure the problem
# I think its more appropriate to first create a dictionary 
# with each day being the index and a list with the remaing text
structured_data = {}
prev_el = 0
# Loops each element and check if its a date
for el in new_list:
    # If its a date, creates an element with empty list, 
    # and them appends the next elements to it
    if isinstance(el, dt)
        structured_data[stre(el)] = []
        prev_el = el
    else:
        sctructured_data[prev_el].append(el)

在 python 中使用正则表达式拆分列表

Split a list with regex expression in python

python

split