我如何将字符串列表分组到不同子列表的列表中

How could I group a list of strings into a list of different sublist

我有一个字符串列表,如下例所示。

list = ['#4008 (Pending update)',
 'Age 1 Female',
 'Onset date',
 '-',
 '#4007 (Pending update)',
 'Onset date',
 'Asymptomatic',
 'Confirmed date',
 '-',
 '+'
 '#4006 (Pending update)',
 'Age 65 Female',
 'Onset date',
 '-',
 'Place of residence',
 '-']

我要将字符串分组到列表的子列表中,如下所示, 如果一个字符串以“#”开头,那么我会将它与它后面的字符串分组,直到出现下一个以“#”开头的字符串。

[['#4008 (Pending update)',
 'Age 1 Female',
 'Onset date',
 '-'],

 ['#4007 (Pending update)',
 'Onset date',
 'Asymptomatic',
 'Confirmed date',
 '-',
 '+'],

['#4006 (Pending update)',
 'Age 65 Female',
 'Onset date',
 '-',
 'Place of residence',
 '-']]
new_list = []
sub_list
n = 0
for i in list:
    if i[0].startswith('#'):
        try i[0+1].
        sub_list.append(i)

new_list.append(sub_list)
new_list

我的想法是从索引 0 字符串开始,逐个检查字符串,并在下一个以 # 开头的字符串出现时中断循环。然后搜索循环再次开始对下一个子列表进行分组,但我现在不知道如何编写代码。怎么可能实现,谢谢

lst = ['#4008 (Pending update)',
 'Age 1 Female',
 'Onset date',
 '-',
 '#4007 (Pending update)',
 'Onset date',
 'Asymptomatic',
 'Confirmed date',
 '-',
 '+',
 '#4006 (Pending update)',
 'Age 65 Female',
 'Onset date',
 '-',
 'Place of residence',
 '-']

out = []
for val in lst:
    if val.startswith('#'):
        out.append([val])
    else:
        out[-1].append(val)

from pprint import pprint
pprint(out, width=40)

打印:

[['#4008 (Pending update)',
  'Age 1 Female',
  'Onset date',
  '-'],
 ['#4007 (Pending update)',
  'Onset date',
  'Asymptomatic',
  'Confirmed date',
  '-',
  '+'],
 ['#4006 (Pending update)',
  'Age 65 Female',
  'Onset date',
  '-',
  'Place of residence',
  '-']]

另一种方法使用zip_longest

from itertools import zip_longest

list_ = ['#4008 (Pending update)',
 'Age 1 Female',
 'Onset date',
 '-',
 '#4007 (Pending update)',
 'Onset date',
 'Asymptomatic',
 'Confirmed date',
 '-',
 '+'
 '#4006 (Pending update)',
 'Age 65 Female',
 'Onset date',
 '-',
 'Place of residence',
 '-']

result, tmp = [], []

for i, j in zip_longest(list_, list_[1:]):
    tmp.append(i)

    # if delimiter is found, push the tmp to result & reset tmp
    if j and j.startswith("#"):
        result.append(tmp)
        tmp = []

# include value from last iteration.
result.append(tmp)

[['#4008 (Pending update)', 'Age 1 Female', 'Onset date', '-'],
 ['#4007 (Pending update)', 'Onset date', 'Asymptomatic', 'Confirmed date', '-', '+'],
 ['#4006 (Pending update)', 'Age 65 Female', 'Onset date', '-', 'Place of residence', '-']]

如果愿意,您可以使用列表推导式轻松处理此问题:

some_list = [
    '#4008 (Pending update)', 'Age 1 Female', 'Onset date', '-',
    '#4007 (Pending update)', 'Onset date', 'Asymptomatic', 'Confirmed date', '-', '+',
    '#4006 (Pending update)', 'Age 65 Female', 'Onset date', '-', 'Place of residence', '-'
    ]

starts = [index for index, string in enumerate(some_list) if '#' in string] + [-2]
# The addition of -2 ensures that the final item in the list is captured

new_list = [some_list[start:stop] for start, stop in zip(starts, starts[1:])]

给定您的输入数据,大约需要 for 循环时间的 1/2:

列表理解:每个循环 1.87 µs ± 157 ns

对于循环:每个循环 3.95 µs ± 232 ns

你可以用非常简单的逻辑来做到这一点。您可以试试下面的代码:

list = ['#4008 (Pending update)',
    'Age 1 Female',
    'Onset date',
    '-',
    '#4007 (Pending update)',
    'Onset date',
    'Asymptomatic',
    'Confirmed date',
    '-',
    '+',
    '#4006 (Pending update)',
    'Age 65 Female',
    'Onset date',
    '-',
    'Place of residence',
    '-']

new_list = []  # this will be the output list
temp_list = []  # holds sub lists

for item in list:  # iterate on each item in the list
    if item[0] == '#':  # of the list item starts with a #,
        # the previous temp list is a separate group
        new_list.append(temp_list)  # add the previous list in the output list
        temp_list = [item]  # create new group start with the current item
    else:
        temp_list.append(item)
new_list.append(temp_list)  # add the remaining last group

new_list = new_list[1:]  # remove the first group as it is empty
print(new_list)

这样,您将获得以下输出。你可以轻松美化它!

[['#4008 (Pending update)', 'Age 1 Female', 'Onset date', '-'], ['#4007 (Pending update)', 'Onset date', 'Asymptomatic', 'Confirmed date', '-', '+'], ['#4006 (Pending update)', 'Age 65 Female', 'Onset date', '-', 'Place of residence', '-']]

在循环中,newList 的第一个维度递增 every-time 新字符串以# 开头。其他不带#的字符串元素追加到对应的二维列表中。

list = ['#4008 (Pending update)',
 'Age 1 Female',
 'Onset date',
 '-',
 '#4007 (Pending update)',
 'Onset date',
 'Asymptomatic',
 'Confirmed date',
 '-',
 '+',
 '#4006 (Pending update)',
 'Age 65 Female',
 'Onset date',
 '-',
 'Place of residence',
 '-']

newList =[]
i = -1
for count,item in enumerate(list):
    if item.startswith('#'):
        i +=1
        newList += [[item]]
    else:
        newList[i] += [item]

print(newList)

你可以做这样的事情,虽然它看起来像是 Andrej 的回答的更糟糕的版本:)

def group(l, key='#', inner=[], out=[]):
    for i, ele in enumerate(l):
        if ele.startswith(key):
            if i > 0:
                out.append(inner)
            inner = [ele]
        else:
            inner.append(ele)
    out.append(inner)
    return out

print(group(list))

[['#4008 (Pending update)', 'Age 1 Female', 'Onset date', '-'],
 ['#4007 (Pending update)',
  'Onset date',
  'Asymptomatic',
  'Confirmed date',
  '-',
  '+'],
 ['#4006 (Pending update)',
  'Age 65 Female',
  'Onset date',
  '-',
  'Place of residence',
  '-']]