具有多个值的先行断言
Lookahead assertion with multiple values
我有以下文字:
[red]
aaa [bbb] hello
[blue]
aaa
[green]
ccc
我想提取 header 部分之间的所有文本。我尝试了从特定部分 header 匹配到 header 列表中的另一个 header 的前瞻断言:
keys = ('red', 'blue', 'green')
for key in keys:
match = re.search(r'\[' + key + r'\](.*)(?=(?:' + '|'.join(keys) + r'|$))',
text, flags=re.DOTALL)
print(key, match.group(1))
我遗漏了一些东西,因为它不匹配任何东西。有什么想法吗?
您可以使用正则表达式查找所有内容!您可以将您的部分和其中的值组合在一起,例如
>>> import re
>>> print re.findall(r'\[(\w*)\]([\w \n]*)',text)
[('red', '\n\naaa '), ('bbb', ' hello\n\n'), ('blue', '\n\naaa\n\n'), ('green', '')]
此处用于您的部分 \[(\w*)\]
和 ([\w \n]*)
用于您部分中的内容。有了这个结果,您可以删除或替换多余的换行符!
希望对您有所帮助!
也许这种方法可行:
keys = ('red', 'blue', 'green')
res = re.findall(r'\[\w+\].?|([\w\[\] ]+)', text)
res = [x for x in res if x]
for n in range(len(keys)):
print(keys[n], res[n])
结果:
('red', 'aaa [bbb] hello')
('blue', 'aaa')
('green', 'ccc')
示例:
最后,我决定不使用正则表达式来匹配部分内容:
# Walk through the file line by line and collect text from the specific sections
keys = ('red', 'blue', 'green')
last_section = ''
for line in text.splitlines():
if line.startswith('#'):
continue
match = re.match(r'^\[(' + '|'.join(keys) + ')\]', line)
if match:
last_section = match.group(1)
continue
if last_section:
new_contents[last_section] += '\n' + line
for section in new_contents:
new_contents[section] = new_contents[section].strip()
一种字符串处理方法,无论您在文本中键入的顺序如何。如果您不想使用正则表达式,希望对您有所帮助!
text = '[red]\naaa [bbb] hello\n[blue]\naaa\n[green]\nccc'
# keys = ('red', 'blue', 'green')
# keys = ('blue', 'red', 'green')
# keys = ('green', 'red', 'blue')
keys = ('green', 'blue', 'red')
# store key and index of key tuple
index_key_tuples = []
for key in keys:
index = text.find('[' + key + ']')
if index != -1:
index_key_tuples.append((index, key))
# sort the index key tuple
index_key_tuples.sort()
i = 0
size = len(index_key_tuples)
while i < size - 1:
# start index of content of key
item = index_key_tuples[i]
key = item[1]
start_index = item[0] + len(key) + 2 # 2 is for square bracket
# end index of content of key
next_item = index_key_tuples[i + 1]
end_index = next_item[0]
# content of key
key_content = text[start_index:end_index].strip()
print(key, key_content)
i += 1
# handle the last key
last_item = index_key_tuples[size-1]
key = last_item[1]
start_index = last_item[0] + len(key) + 2
key_content = text[start_index:].strip()
print(key, key_content)
我有以下文字:
[red]
aaa [bbb] hello
[blue]
aaa
[green]
ccc
我想提取 header 部分之间的所有文本。我尝试了从特定部分 header 匹配到 header 列表中的另一个 header 的前瞻断言:
keys = ('red', 'blue', 'green')
for key in keys:
match = re.search(r'\[' + key + r'\](.*)(?=(?:' + '|'.join(keys) + r'|$))',
text, flags=re.DOTALL)
print(key, match.group(1))
我遗漏了一些东西,因为它不匹配任何东西。有什么想法吗?
您可以使用正则表达式查找所有内容!您可以将您的部分和其中的值组合在一起,例如
>>> import re
>>> print re.findall(r'\[(\w*)\]([\w \n]*)',text)
[('red', '\n\naaa '), ('bbb', ' hello\n\n'), ('blue', '\n\naaa\n\n'), ('green', '')]
此处用于您的部分 \[(\w*)\]
和 ([\w \n]*)
用于您部分中的内容。有了这个结果,您可以删除或替换多余的换行符!
希望对您有所帮助!
也许这种方法可行:
keys = ('red', 'blue', 'green')
res = re.findall(r'\[\w+\].?|([\w\[\] ]+)', text)
res = [x for x in res if x]
for n in range(len(keys)):
print(keys[n], res[n])
结果:
('red', 'aaa [bbb] hello')
('blue', 'aaa')
('green', 'ccc')
示例:
最后,我决定不使用正则表达式来匹配部分内容:
# Walk through the file line by line and collect text from the specific sections
keys = ('red', 'blue', 'green')
last_section = ''
for line in text.splitlines():
if line.startswith('#'):
continue
match = re.match(r'^\[(' + '|'.join(keys) + ')\]', line)
if match:
last_section = match.group(1)
continue
if last_section:
new_contents[last_section] += '\n' + line
for section in new_contents:
new_contents[section] = new_contents[section].strip()
一种字符串处理方法,无论您在文本中键入的顺序如何。如果您不想使用正则表达式,希望对您有所帮助!
text = '[red]\naaa [bbb] hello\n[blue]\naaa\n[green]\nccc'
# keys = ('red', 'blue', 'green')
# keys = ('blue', 'red', 'green')
# keys = ('green', 'red', 'blue')
keys = ('green', 'blue', 'red')
# store key and index of key tuple
index_key_tuples = []
for key in keys:
index = text.find('[' + key + ']')
if index != -1:
index_key_tuples.append((index, key))
# sort the index key tuple
index_key_tuples.sort()
i = 0
size = len(index_key_tuples)
while i < size - 1:
# start index of content of key
item = index_key_tuples[i]
key = item[1]
start_index = item[0] + len(key) + 2 # 2 is for square bracket
# end index of content of key
next_item = index_key_tuples[i + 1]
end_index = next_item[0]
# content of key
key_content = text[start_index:end_index].strip()
print(key, key_content)
i += 1
# handle the last key
last_item = index_key_tuples[size-1]
key = last_item[1]
start_index = last_item[0] + len(key) + 2
key_content = text[start_index:].strip()
print(key, key_content)