使用枚举在字符串元素列表的定义 re.match 间隔之间提取正则表达式
Extracting a Regex between defined re.match intervals of a list of string elements using enumerate
我正在尝试从列表(许多元素)中提取单独的捕获组。有多个捕获组,但捕获组本身没有什么独特之处。
my_list = ['this is a test element 1', 'I need to capture **after** this element','capture1','capture2', 'capture3','.........', 'I need to capture **before** this element and separately after this element' , 'captureA', 'captureB','captureC', 'last capture ends before this element]
my_reg = re.compile(r'.*this element.*')
代码如下:
match_indices = [i for i, s in enumerate(my_list) if my_reg.match(s)]
captured_text = my_list[min(match_indices)+1 : max(match_indices)]
match_indices 给我每个匹配元素的列表位置和
捕获的文本获取第一个和最后一个匹配之间位置的实际元素。
我无法获取捕获的文本以读取每个匹配位置之间的单独组。
例如输出为
Group1 = capture1capture2capture3
Group2 = captureAcaptureBcaptureC
而不是 capture1capture2capture3captureAcaptureBcaptureC
任何指导?谢谢
使用您的代码,returns
match_indices = [1, 6, 10]
captured_text = ['capture1', 'capture2', 'capture3', '.........', 'I need to capture **before** this element and separately after this element', 'captureA', 'captureB', 'captureC'] # all between 1 and 10
要捕获索引之间的组,您不能使用 min()
和 max()
。而是迭代 match_indices
中的每对相邻索引。 captured_text
将 return 列表列表。
my_list = ['this is a test element 1',
'I need to capture **after** this element',
'capture1',
'capture2',
'capture3',
'.........',
'I need to capture **before** this element and separately after this element',
'captureA',
'captureB',
'captureC',
'last capture ends before this element']
match_indices = [i for i, s in enumerate(my_list) if 'this element' in s]
captured_text = []
for i in range(1, len(match_indices)):
start = match_indices[i-1] + 1
end = match_indices[i]
captured_text.append(my_list[start:end])
print(captured_text)
# captured_text = [
# ['capture1', 'capture2', 'capture3', '.........'],
# ['captureA', 'captureB', 'captureC']
# ]
我正在尝试从列表(许多元素)中提取单独的捕获组。有多个捕获组,但捕获组本身没有什么独特之处。
my_list = ['this is a test element 1', 'I need to capture **after** this element','capture1','capture2', 'capture3','.........', 'I need to capture **before** this element and separately after this element' , 'captureA', 'captureB','captureC', 'last capture ends before this element]
my_reg = re.compile(r'.*this element.*')
代码如下:
match_indices = [i for i, s in enumerate(my_list) if my_reg.match(s)]
captured_text = my_list[min(match_indices)+1 : max(match_indices)]
match_indices 给我每个匹配元素的列表位置和 捕获的文本获取第一个和最后一个匹配之间位置的实际元素。
我无法获取捕获的文本以读取每个匹配位置之间的单独组。
例如输出为
Group1 = capture1capture2capture3
Group2 = captureAcaptureBcaptureC
而不是 capture1capture2capture3captureAcaptureBcaptureC 任何指导?谢谢
使用您的代码,returns
match_indices = [1, 6, 10]
captured_text = ['capture1', 'capture2', 'capture3', '.........', 'I need to capture **before** this element and separately after this element', 'captureA', 'captureB', 'captureC'] # all between 1 and 10
要捕获索引之间的组,您不能使用 min()
和 max()
。而是迭代 match_indices
中的每对相邻索引。 captured_text
将 return 列表列表。
my_list = ['this is a test element 1',
'I need to capture **after** this element',
'capture1',
'capture2',
'capture3',
'.........',
'I need to capture **before** this element and separately after this element',
'captureA',
'captureB',
'captureC',
'last capture ends before this element']
match_indices = [i for i, s in enumerate(my_list) if 'this element' in s]
captured_text = []
for i in range(1, len(match_indices)):
start = match_indices[i-1] + 1
end = match_indices[i]
captured_text.append(my_list[start:end])
print(captured_text)
# captured_text = [
# ['capture1', 'capture2', 'capture3', '.........'],
# ['captureA', 'captureB', 'captureC']
# ]