Python3: 如何根据条件拆分列表？

Question

我找到了很多关于如何将列表拆分为大小均匀的块的答案，但我有一个不同的问题

我的数据格式如下。

> header1
line1
line2
...
> header2
line4
line5
...

我想将这些行分组到它们各自的 header 下。

获得 header 很容易。 headers = [x for x in lines if x.startswith('>')]

但是这个技巧对后续行不起作用，因为无法知道每一行下面有哪些行 header。

理想情况下，我想要一个格式类似于 [[line1, line2], [line4, line5]...]

的列表

我有一个使用 while 循环的有效解决方案，但它看起来很难看。我如何使用列表理解或现有库来完成此操作？

Answer 1

将 itertools.groupby 与自定义键功能一起使用，每次我们看到新的 header 时都会更改。在这个函数中，我们递增 ctr.

from itertools import groupby

lis = ['>a', 'b', 'c', '>d', 'e', '>f', '>g']

def group_by_header(lis: list):
    def header_counter(x: str):
        if x.startswith('>'):
            header_counter.ctr += 1
        return header_counter.ctr
    header_counter.ctr = 0

    return groupby(lis, key=header_counter)

print([list(l) for k, l in group_by_header(lis)])
# [['>a', 'b', 'c'], ['>d', 'e'], ['>f'], ['>g']]

Answer 2

我的解决方案很可能不是最好的，但我们是：

示例数据：

data = """> header1
line1
line2
> header2
line4
line5
""".split("\n")

OP 提到的简单 for-loop 解决方案：

def parse(d):
    result = []
    chunk = []
    for line in d:
        if not line:
            continue
        elif line.startswith(">"):
            if not chunk:
                continue
            result.append(chunk)
            chunk = []
            continue
        chunk.append(line)

    if chunk:
        result.append(chunk)
    
    return result

并通过索引 headers + 使用索引（2 行）对数组进行切片：

def _parse(d):
    index = [i for i in range(0, len(d)) if d[i].startswith(">")] + [len(d)-1]
    return [d[index[i]+1:index[i+1]]  for i in range(0, len(index)-1)]

两者的结果 [['line1', 'line2'], ['line4', 'line5']]

Python3: 如何根据条件拆分列表？

Python3: How can I split a list based on condition?

python

list-comprehension

python-3.x