Python 从列表中删除部分重复项
Python remove partial duplicates from a list
我有一个创建不当的项目列表。它不是复制整个项目一次,而是对同一项目进行多次部分复制。部分重复项与其他重复项和一些独特的项目混合在一起。例如列表 a:
a = ['one two','one two three four','one two three','five six','five six seven','eight nine']
我想删除部分重复项并保留项目的最长表达。例如我想生成列表 b:
b = ['one two three four', 'five six seven','eight nine']
物品的完整性必须保持完整,不能变成:
c = '[二一三四', 'vife six seven', 'eight nine']
试试这个:
def group_partials(strings):
it = iter(sorted(strings))
prev = next(it)
for s in it:
if not s.startswith(prev):
yield prev
prev = s
yield s
a = ['one two','one two three', 'one two three four', 'five six', 'five six seven', 'eight nine']
b = list(group_partials(a))
您可以为此使用集合。
试试这个代码
a = ['one two','one two three', 'one two three four', 'five six', 'five six seven','eight nine']
# check for subsets
for i in range(len(a)):
for j in range(len(a)):
if i==j: continue # same index
if (set(a[i].split()) & set(a[j].split())) == set(a[i].split()): # if subset
a[i]="" # clear string
# a = [x for x in a if len(x)] # remove empty strings
b = []
for x in a: # each string in a
if len(x) > 0: # if not empty
b.append(x) # add to final list
a = b
print(a)
输出
['one two three four', 'five six seven', 'eight nine']
我有一个创建不当的项目列表。它不是复制整个项目一次,而是对同一项目进行多次部分复制。部分重复项与其他重复项和一些独特的项目混合在一起。例如列表 a:
a = ['one two','one two three four','one two three','five six','five six seven','eight nine']
我想删除部分重复项并保留项目的最长表达。例如我想生成列表 b:
b = ['one two three four', 'five six seven','eight nine']
物品的完整性必须保持完整,不能变成:
c = '[二一三四', 'vife six seven', 'eight nine']
试试这个:
def group_partials(strings):
it = iter(sorted(strings))
prev = next(it)
for s in it:
if not s.startswith(prev):
yield prev
prev = s
yield s
a = ['one two','one two three', 'one two three four', 'five six', 'five six seven', 'eight nine']
b = list(group_partials(a))
您可以为此使用集合。
试试这个代码
a = ['one two','one two three', 'one two three four', 'five six', 'five six seven','eight nine']
# check for subsets
for i in range(len(a)):
for j in range(len(a)):
if i==j: continue # same index
if (set(a[i].split()) & set(a[j].split())) == set(a[i].split()): # if subset
a[i]="" # clear string
# a = [x for x in a if len(x)] # remove empty strings
b = []
for x in a: # each string in a
if len(x) > 0: # if not empty
b.append(x) # add to final list
a = b
print(a)
输出
['one two three four', 'five six seven', 'eight nine']