我如何将字符串列表分组到不同子列表的列表中
How could I group a list of strings into a list of different sublist
我有一个字符串列表,如下例所示。
list = ['#4008 (Pending update)',
'Age 1 Female',
'Onset date',
'-',
'#4007 (Pending update)',
'Onset date',
'Asymptomatic',
'Confirmed date',
'-',
'+'
'#4006 (Pending update)',
'Age 65 Female',
'Onset date',
'-',
'Place of residence',
'-']
我要将字符串分组到列表的子列表中,如下所示,
如果一个字符串以“#”开头,那么我会将它与它后面的字符串分组,直到出现下一个以“#”开头的字符串。
[['#4008 (Pending update)',
'Age 1 Female',
'Onset date',
'-'],
['#4007 (Pending update)',
'Onset date',
'Asymptomatic',
'Confirmed date',
'-',
'+'],
['#4006 (Pending update)',
'Age 65 Female',
'Onset date',
'-',
'Place of residence',
'-']]
new_list = []
sub_list
n = 0
for i in list:
if i[0].startswith('#'):
try i[0+1].
sub_list.append(i)
new_list.append(sub_list)
new_list
我的想法是从索引 0 字符串开始,逐个检查字符串,并在下一个以 # 开头的字符串出现时中断循环。然后搜索循环再次开始对下一个子列表进行分组,但我现在不知道如何编写代码。怎么可能实现,谢谢
lst = ['#4008 (Pending update)',
'Age 1 Female',
'Onset date',
'-',
'#4007 (Pending update)',
'Onset date',
'Asymptomatic',
'Confirmed date',
'-',
'+',
'#4006 (Pending update)',
'Age 65 Female',
'Onset date',
'-',
'Place of residence',
'-']
out = []
for val in lst:
if val.startswith('#'):
out.append([val])
else:
out[-1].append(val)
from pprint import pprint
pprint(out, width=40)
打印:
[['#4008 (Pending update)',
'Age 1 Female',
'Onset date',
'-'],
['#4007 (Pending update)',
'Onset date',
'Asymptomatic',
'Confirmed date',
'-',
'+'],
['#4006 (Pending update)',
'Age 65 Female',
'Onset date',
'-',
'Place of residence',
'-']]
另一种方法使用zip_longest
from itertools import zip_longest
list_ = ['#4008 (Pending update)',
'Age 1 Female',
'Onset date',
'-',
'#4007 (Pending update)',
'Onset date',
'Asymptomatic',
'Confirmed date',
'-',
'+'
'#4006 (Pending update)',
'Age 65 Female',
'Onset date',
'-',
'Place of residence',
'-']
result, tmp = [], []
for i, j in zip_longest(list_, list_[1:]):
tmp.append(i)
# if delimiter is found, push the tmp to result & reset tmp
if j and j.startswith("#"):
result.append(tmp)
tmp = []
# include value from last iteration.
result.append(tmp)
[['#4008 (Pending update)', 'Age 1 Female', 'Onset date', '-'],
['#4007 (Pending update)', 'Onset date', 'Asymptomatic', 'Confirmed date', '-', '+'],
['#4006 (Pending update)', 'Age 65 Female', 'Onset date', '-', 'Place of residence', '-']]
如果愿意,您可以使用列表推导式轻松处理此问题:
some_list = [
'#4008 (Pending update)', 'Age 1 Female', 'Onset date', '-',
'#4007 (Pending update)', 'Onset date', 'Asymptomatic', 'Confirmed date', '-', '+',
'#4006 (Pending update)', 'Age 65 Female', 'Onset date', '-', 'Place of residence', '-'
]
starts = [index for index, string in enumerate(some_list) if '#' in string] + [-2]
# The addition of -2 ensures that the final item in the list is captured
new_list = [some_list[start:stop] for start, stop in zip(starts, starts[1:])]
给定您的输入数据,大约需要 for 循环时间的 1/2:
列表理解:每个循环 1.87 µs ± 157 ns
对于循环:每个循环 3.95 µs ± 232 ns
你可以用非常简单的逻辑来做到这一点。您可以试试下面的代码:
list = ['#4008 (Pending update)',
'Age 1 Female',
'Onset date',
'-',
'#4007 (Pending update)',
'Onset date',
'Asymptomatic',
'Confirmed date',
'-',
'+',
'#4006 (Pending update)',
'Age 65 Female',
'Onset date',
'-',
'Place of residence',
'-']
new_list = [] # this will be the output list
temp_list = [] # holds sub lists
for item in list: # iterate on each item in the list
if item[0] == '#': # of the list item starts with a #,
# the previous temp list is a separate group
new_list.append(temp_list) # add the previous list in the output list
temp_list = [item] # create new group start with the current item
else:
temp_list.append(item)
new_list.append(temp_list) # add the remaining last group
new_list = new_list[1:] # remove the first group as it is empty
print(new_list)
这样,您将获得以下输出。你可以轻松美化它!
[['#4008 (Pending update)', 'Age 1 Female', 'Onset date', '-'], ['#4007 (Pending update)', 'Onset date', 'Asymptomatic', 'Confirmed date', '-', '+'], ['#4006 (Pending update)', 'Age 65 Female', 'Onset date', '-', 'Place of residence', '-']]
在循环中,newList 的第一个维度递增 every-time 新字符串以# 开头。其他不带#的字符串元素追加到对应的二维列表中。
list = ['#4008 (Pending update)',
'Age 1 Female',
'Onset date',
'-',
'#4007 (Pending update)',
'Onset date',
'Asymptomatic',
'Confirmed date',
'-',
'+',
'#4006 (Pending update)',
'Age 65 Female',
'Onset date',
'-',
'Place of residence',
'-']
newList =[]
i = -1
for count,item in enumerate(list):
if item.startswith('#'):
i +=1
newList += [[item]]
else:
newList[i] += [item]
print(newList)
你可以做这样的事情,虽然它看起来像是 Andrej 的回答的更糟糕的版本:)
def group(l, key='#', inner=[], out=[]):
for i, ele in enumerate(l):
if ele.startswith(key):
if i > 0:
out.append(inner)
inner = [ele]
else:
inner.append(ele)
out.append(inner)
return out
print(group(list))
[['#4008 (Pending update)', 'Age 1 Female', 'Onset date', '-'],
['#4007 (Pending update)',
'Onset date',
'Asymptomatic',
'Confirmed date',
'-',
'+'],
['#4006 (Pending update)',
'Age 65 Female',
'Onset date',
'-',
'Place of residence',
'-']]
我有一个字符串列表,如下例所示。
list = ['#4008 (Pending update)',
'Age 1 Female',
'Onset date',
'-',
'#4007 (Pending update)',
'Onset date',
'Asymptomatic',
'Confirmed date',
'-',
'+'
'#4006 (Pending update)',
'Age 65 Female',
'Onset date',
'-',
'Place of residence',
'-']
我要将字符串分组到列表的子列表中,如下所示, 如果一个字符串以“#”开头,那么我会将它与它后面的字符串分组,直到出现下一个以“#”开头的字符串。
[['#4008 (Pending update)',
'Age 1 Female',
'Onset date',
'-'],
['#4007 (Pending update)',
'Onset date',
'Asymptomatic',
'Confirmed date',
'-',
'+'],
['#4006 (Pending update)',
'Age 65 Female',
'Onset date',
'-',
'Place of residence',
'-']]
new_list = []
sub_list
n = 0
for i in list:
if i[0].startswith('#'):
try i[0+1].
sub_list.append(i)
new_list.append(sub_list)
new_list
我的想法是从索引 0 字符串开始,逐个检查字符串,并在下一个以 # 开头的字符串出现时中断循环。然后搜索循环再次开始对下一个子列表进行分组,但我现在不知道如何编写代码。怎么可能实现,谢谢
lst = ['#4008 (Pending update)',
'Age 1 Female',
'Onset date',
'-',
'#4007 (Pending update)',
'Onset date',
'Asymptomatic',
'Confirmed date',
'-',
'+',
'#4006 (Pending update)',
'Age 65 Female',
'Onset date',
'-',
'Place of residence',
'-']
out = []
for val in lst:
if val.startswith('#'):
out.append([val])
else:
out[-1].append(val)
from pprint import pprint
pprint(out, width=40)
打印:
[['#4008 (Pending update)',
'Age 1 Female',
'Onset date',
'-'],
['#4007 (Pending update)',
'Onset date',
'Asymptomatic',
'Confirmed date',
'-',
'+'],
['#4006 (Pending update)',
'Age 65 Female',
'Onset date',
'-',
'Place of residence',
'-']]
另一种方法使用zip_longest
from itertools import zip_longest
list_ = ['#4008 (Pending update)',
'Age 1 Female',
'Onset date',
'-',
'#4007 (Pending update)',
'Onset date',
'Asymptomatic',
'Confirmed date',
'-',
'+'
'#4006 (Pending update)',
'Age 65 Female',
'Onset date',
'-',
'Place of residence',
'-']
result, tmp = [], []
for i, j in zip_longest(list_, list_[1:]):
tmp.append(i)
# if delimiter is found, push the tmp to result & reset tmp
if j and j.startswith("#"):
result.append(tmp)
tmp = []
# include value from last iteration.
result.append(tmp)
[['#4008 (Pending update)', 'Age 1 Female', 'Onset date', '-'],
['#4007 (Pending update)', 'Onset date', 'Asymptomatic', 'Confirmed date', '-', '+'],
['#4006 (Pending update)', 'Age 65 Female', 'Onset date', '-', 'Place of residence', '-']]
如果愿意,您可以使用列表推导式轻松处理此问题:
some_list = [
'#4008 (Pending update)', 'Age 1 Female', 'Onset date', '-',
'#4007 (Pending update)', 'Onset date', 'Asymptomatic', 'Confirmed date', '-', '+',
'#4006 (Pending update)', 'Age 65 Female', 'Onset date', '-', 'Place of residence', '-'
]
starts = [index for index, string in enumerate(some_list) if '#' in string] + [-2]
# The addition of -2 ensures that the final item in the list is captured
new_list = [some_list[start:stop] for start, stop in zip(starts, starts[1:])]
给定您的输入数据,大约需要 for 循环时间的 1/2:
列表理解:每个循环 1.87 µs ± 157 ns
对于循环:每个循环 3.95 µs ± 232 ns
你可以用非常简单的逻辑来做到这一点。您可以试试下面的代码:
list = ['#4008 (Pending update)',
'Age 1 Female',
'Onset date',
'-',
'#4007 (Pending update)',
'Onset date',
'Asymptomatic',
'Confirmed date',
'-',
'+',
'#4006 (Pending update)',
'Age 65 Female',
'Onset date',
'-',
'Place of residence',
'-']
new_list = [] # this will be the output list
temp_list = [] # holds sub lists
for item in list: # iterate on each item in the list
if item[0] == '#': # of the list item starts with a #,
# the previous temp list is a separate group
new_list.append(temp_list) # add the previous list in the output list
temp_list = [item] # create new group start with the current item
else:
temp_list.append(item)
new_list.append(temp_list) # add the remaining last group
new_list = new_list[1:] # remove the first group as it is empty
print(new_list)
这样,您将获得以下输出。你可以轻松美化它!
[['#4008 (Pending update)', 'Age 1 Female', 'Onset date', '-'], ['#4007 (Pending update)', 'Onset date', 'Asymptomatic', 'Confirmed date', '-', '+'], ['#4006 (Pending update)', 'Age 65 Female', 'Onset date', '-', 'Place of residence', '-']]
在循环中,newList 的第一个维度递增 every-time 新字符串以# 开头。其他不带#的字符串元素追加到对应的二维列表中。
list = ['#4008 (Pending update)',
'Age 1 Female',
'Onset date',
'-',
'#4007 (Pending update)',
'Onset date',
'Asymptomatic',
'Confirmed date',
'-',
'+',
'#4006 (Pending update)',
'Age 65 Female',
'Onset date',
'-',
'Place of residence',
'-']
newList =[]
i = -1
for count,item in enumerate(list):
if item.startswith('#'):
i +=1
newList += [[item]]
else:
newList[i] += [item]
print(newList)
你可以做这样的事情,虽然它看起来像是 Andrej 的回答的更糟糕的版本:)
def group(l, key='#', inner=[], out=[]):
for i, ele in enumerate(l):
if ele.startswith(key):
if i > 0:
out.append(inner)
inner = [ele]
else:
inner.append(ele)
out.append(inner)
return out
print(group(list))
[['#4008 (Pending update)', 'Age 1 Female', 'Onset date', '-'],
['#4007 (Pending update)',
'Onset date',
'Asymptomatic',
'Confirmed date',
'-',
'+'],
['#4006 (Pending update)',
'Age 65 Female',
'Onset date',
'-',
'Place of residence',
'-']]