模糊字符串分割成 Python 2.x

Question

输入文件：

    rep_origin      607..1720
                    /label=Rep
    Region          2643..5020
                    /label="region"
                    extra_info and stuff

我正在尝试按第一个列式条目拆分。例如，我想要一个如下所示的列表...

期望输出：

['rep_origin      607..1720      /label=Rep', 'Region          2643..5020                       /label="region"                         extra_info and stuff']

我尝试按“ ”拆分，但这给了我一些疯狂的东西。如果我可以在末尾添加一个 "fuzzy" 搜索词，其中包含所有字母字符但不包含空格。那将解决问题。我想你可以用正则表达式来做，比如'[A-Z]'findall，但我不确定是否有更简单的方法。

有没有办法在 string.split 标识符的末尾添加 "fuzzy" 搜索词？（即 original_string.' [alphabet_character]'

Answer 1

我不确定你到底在找什么，但下面的 parse 函数从你的问题中获取文本，returns 一个部分列表，一个部分是每个部分的行（删除了前导和尾随空格）。

#!/usr/bin/env python

import re


# This is the input from your question
INPUT_TEXT = '''\
    rep_origin      607..1720
                    /label=Rep
    Region          2643..5020
                    /label="region"
                    extra_info and stuff'''


# A regular expression that matches the start of a section. A section
# start is a line that has 4 spaces before the first non-space
# character.
match_section_start = re.compile(r'^    [^ ]').match


def parse(text):
    sections = []
    section_lines = None

    def append_section_if_lines():
        if section_lines:
            sections.append(section_lines)

    for line in text.split('\n'):
        if match_section_start(line):
            # We've found the start of a new section. Unless this is
            # the first section, save the previous section.
            append_section_if_lines()
            section_lines = []
        section_lines.append(line.strip())

    # Save the last section.
    append_section_if_lines()

    return sections


sections = parse(INPUT_TEXT)
print(sections)

模糊字符串分割成 Python 2.x

fuzzy string split in Python 2.x

python

string

parsing

split

fuzzy-search