模糊字符串分割成 Python 2.x
fuzzy string split in Python 2.x
输入文件:
rep_origin 607..1720
/label=Rep
Region 2643..5020
/label="region"
extra_info and stuff
我正在尝试按第一个列式条目拆分。例如,我想要一个如下所示的列表...
期望输出:
['rep_origin 607..1720 /label=Rep', 'Region 2643..5020 /label="region" extra_info and stuff']
我尝试按“ ”拆分,但这给了我一些疯狂的东西。如果我可以在末尾添加一个 "fuzzy" 搜索词,其中包含所有字母字符但不包含空格。那将解决问题。我想你可以用正则表达式来做,比如'[A-Z]'findall,但我不确定是否有更简单的方法。
有没有办法在 string.split 标识符的末尾添加 "fuzzy" 搜索词?(即 original_string.' [alphabet_character]'
我不确定你到底在找什么,但下面的 parse
函数从你的问题中获取文本,returns 一个部分列表,一个部分是每个部分的行(删除了前导和尾随空格)。
#!/usr/bin/env python
import re
# This is the input from your question
INPUT_TEXT = '''\
rep_origin 607..1720
/label=Rep
Region 2643..5020
/label="region"
extra_info and stuff'''
# A regular expression that matches the start of a section. A section
# start is a line that has 4 spaces before the first non-space
# character.
match_section_start = re.compile(r'^ [^ ]').match
def parse(text):
sections = []
section_lines = None
def append_section_if_lines():
if section_lines:
sections.append(section_lines)
for line in text.split('\n'):
if match_section_start(line):
# We've found the start of a new section. Unless this is
# the first section, save the previous section.
append_section_if_lines()
section_lines = []
section_lines.append(line.strip())
# Save the last section.
append_section_if_lines()
return sections
sections = parse(INPUT_TEXT)
print(sections)
输入文件:
rep_origin 607..1720
/label=Rep
Region 2643..5020
/label="region"
extra_info and stuff
我正在尝试按第一个列式条目拆分。例如,我想要一个如下所示的列表...
期望输出:
['rep_origin 607..1720 /label=Rep', 'Region 2643..5020 /label="region" extra_info and stuff']
我尝试按“ ”拆分,但这给了我一些疯狂的东西。如果我可以在末尾添加一个 "fuzzy" 搜索词,其中包含所有字母字符但不包含空格。那将解决问题。我想你可以用正则表达式来做,比如'[A-Z]'findall,但我不确定是否有更简单的方法。
有没有办法在 string.split 标识符的末尾添加 "fuzzy" 搜索词?(即 original_string.' [alphabet_character]'
我不确定你到底在找什么,但下面的 parse
函数从你的问题中获取文本,returns 一个部分列表,一个部分是每个部分的行(删除了前导和尾随空格)。
#!/usr/bin/env python
import re
# This is the input from your question
INPUT_TEXT = '''\
rep_origin 607..1720
/label=Rep
Region 2643..5020
/label="region"
extra_info and stuff'''
# A regular expression that matches the start of a section. A section
# start is a line that has 4 spaces before the first non-space
# character.
match_section_start = re.compile(r'^ [^ ]').match
def parse(text):
sections = []
section_lines = None
def append_section_if_lines():
if section_lines:
sections.append(section_lines)
for line in text.split('\n'):
if match_section_start(line):
# We've found the start of a new section. Unless this is
# the first section, save the previous section.
append_section_if_lines()
section_lines = []
section_lines.append(line.strip())
# Save the last section.
append_section_if_lines()
return sections
sections = parse(INPUT_TEXT)
print(sections)