如何删除在 python 列表中的多个项目中重复的文本?
How to remove text which repeats in multiple items in a python list?
不知是否有人可以帮忙。我有一个 python 列表,其中包含抗体名称:
['anti-human CD86',
'anti-human CD274 (B7-H1, PD-L1)',
'anti-human CD270 (HVEM, TR2)',
...
'anti-human CD155 (PVR)',
'anti-human CD112 (Nectin-2)',
'anti-human CD47']
我想删除 'anti-human ' 部分,所以我只有一个实际蛋白质目标的列表,例如[CD86, CD274 ... CD47].
我试过多种方法,包括:
for i in parsed_protein_names:
i.split('anti-human ')
但似乎没有任何进展。谁能给点建议?
如果您知道要删除的片段的长度,您可以使用:
parsed_protein_names=[string[11:] for string in parsed_protein_names]
不然就复杂了。请注意,以下算法也会删除 CD
部分。
minlen=len(sorted(parsed_protein_names,key=len)[0])
for x in range(minlen):
if len(set([string[x] for string in parsed_protein_names]))!=1:
break
parsed_protein_names=[string[x:] for string in parsed_protein_names]
一个简单的list comprehension with replace()
就可以了
>>> antibodies
['anti-human CD86', 'anti-human CD274 (B7-H1, PD-L1)', 'anti-human CD270 (HVEM, TR2)']
>>> [e.replace("anti-human ", "") for e in antibodies]
['CD86', 'CD274 (B7-H1, PD-L1)', 'CD270 (HVEM, TR2)']
假设您的列表定义如下:
parsed_protein_names = ['anti-human CD86',
'anti-human CD274 (B7-H1, PD-L1)',
'anti-human CD270 (HVEM, TR2)',
'...',
'anti-human CD155 (PVR)',
'anti-human CD112 (Nectin-2)',
'anti-human CD47']
你有几个不同的选项可以使用 list
理解。
str.replace
result_list = [n.replace('anti-human ', '', 1) for n in parsed_protein_names]
print(result_list)
str.split
result_list = [n.split('anti-human', 1)[-1].lstrip() for n in parsed_protein_names]
print(result_list)
这是输出,无论如何:
['CD86', 'CD274 (B7-H1, PD-L1)', 'CD270 (HVEM, TR2)', '...', 'CD155 (PVR)', 'CD112 (Nectin-2)', 'CD47']
您要查找的函数是 "lstrip" 而不是 "split"
这是一个应该有效的代码
mylist = ['anti-human CD86','anti-human CD274 (B7-H1, PD-L1)','anti-human CD270 (HVEM, TR2)','anti-human CD155 (PVR)','anti-human CD112 (Nectin-2)','anti-human CD47']
my_output_list = []
for i in mylist:
a = i.lstrip('anti-human')
my_output_list.append(a)
print(my_output_list)
不知是否有人可以帮忙。我有一个 python 列表,其中包含抗体名称:
['anti-human CD86',
'anti-human CD274 (B7-H1, PD-L1)',
'anti-human CD270 (HVEM, TR2)',
...
'anti-human CD155 (PVR)',
'anti-human CD112 (Nectin-2)',
'anti-human CD47']
我想删除 'anti-human ' 部分,所以我只有一个实际蛋白质目标的列表,例如[CD86, CD274 ... CD47].
我试过多种方法,包括:
for i in parsed_protein_names:
i.split('anti-human ')
但似乎没有任何进展。谁能给点建议?
如果您知道要删除的片段的长度,您可以使用:
parsed_protein_names=[string[11:] for string in parsed_protein_names]
不然就复杂了。请注意,以下算法也会删除 CD
部分。
minlen=len(sorted(parsed_protein_names,key=len)[0])
for x in range(minlen):
if len(set([string[x] for string in parsed_protein_names]))!=1:
break
parsed_protein_names=[string[x:] for string in parsed_protein_names]
一个简单的list comprehension with replace()
就可以了
>>> antibodies
['anti-human CD86', 'anti-human CD274 (B7-H1, PD-L1)', 'anti-human CD270 (HVEM, TR2)']
>>> [e.replace("anti-human ", "") for e in antibodies]
['CD86', 'CD274 (B7-H1, PD-L1)', 'CD270 (HVEM, TR2)']
假设您的列表定义如下:
parsed_protein_names = ['anti-human CD86',
'anti-human CD274 (B7-H1, PD-L1)',
'anti-human CD270 (HVEM, TR2)',
'...',
'anti-human CD155 (PVR)',
'anti-human CD112 (Nectin-2)',
'anti-human CD47']
你有几个不同的选项可以使用 list
理解。
str.replace
result_list = [n.replace('anti-human ', '', 1) for n in parsed_protein_names]
print(result_list)
str.split
result_list = [n.split('anti-human', 1)[-1].lstrip() for n in parsed_protein_names]
print(result_list)
这是输出,无论如何:
['CD86', 'CD274 (B7-H1, PD-L1)', 'CD270 (HVEM, TR2)', '...', 'CD155 (PVR)', 'CD112 (Nectin-2)', 'CD47']
您要查找的函数是 "lstrip" 而不是 "split"
这是一个应该有效的代码
mylist = ['anti-human CD86','anti-human CD274 (B7-H1, PD-L1)','anti-human CD270 (HVEM, TR2)','anti-human CD155 (PVR)','anti-human CD112 (Nectin-2)','anti-human CD47']
my_output_list = []
for i in mylist:
a = i.lstrip('anti-human')
my_output_list.append(a)
print(my_output_list)