在数组中查找重复项并使用 Python 查找它们的值(深度搜索)
Find duplicates in an array and find their values (deep search) with Python
我有一个行数组,每行由以下内容表示:
{
'ms': int,
'e_up': bool,
'e_down': bool,
'f_up': bool,
'f_down': bool,
'l_up': bool,
'l_down': bool,
'r_up': bool,
'r_down': bool,
'b': int,
'a': int,
'c': int,
'd': int
}
我想遍历所有行(行数组,作为字典)并找到所有重复项及其 .ms 属性。
例如,如果我有:
(1902, False, False, False, False, False, False, True, False, 128, -37, -127, -20)
(1843, False, False, False, False, True, False, False, False, 0, 13, -13, 0)
(1968, False, False, False, False, True, False, False, False, 0, 13, -13, 0)
(234, False, False, False, False, True, False, False, False, 0, 13, -13, 0)
(0, False, False, False, False, True, False, False, False, 0, 13, -13, 0)
我希望输出为:
[
[
1843,
1968,
234,
0
]
]
我想找到所有可能的组合,时间在这里不是问题,如果花费额外的时间对我来说并不重要。如何使用 Python 完成此操作? (请不要使用外部库)
您可以利用元组可以用作字典中的键这一事实。下面的代码使用 'ms' 以外的值的元组作为字典中的键, 'ms' 值在字典中保存为列表。任何包含 2 个或更多值的列表表示重复:
itemlist = list()
itemlist.append((1902, False, False, False, False, False, False, True, False, 128, -37, -127, -20))
itemlist.append((1843, False, False, False, False, True, False, False, False, 0, 13, -13, 0))
itemlist.append((1968, False, False, False, False, True, False, False, False, 0, 13, -13, 0))
itemlist.append((234, False, False, False, False, True, False, False, False, 0, 13, -13, 0))
itemlist.append((0, False, False, False, False, True, False, False, False, 0, 13, -13, 0))
itemdict = dict()
# create dictionary with lists of items according to signature
for item in itemlist:
key = item[1:]
if key in itemdict:
itemdict[key].append(item[0])
else:
itemdict[key] = [item[0]]
# iterate over dictionary and find items with more than one occurence
duplicates = []
for value in itemdict.values():
if len(value)>1:
duplicates.extend(value)
print(duplicates)
我解决问题的方法是搜索每个索引与数组的每个其他 non-checked 索引并找到重复项。
def find_duplicate(lines, line, duplicates, checked):
if (line['ms'] in checked):
return duplicates, checked
duplicate = list()
duplicate.append(line['ms'])
checked.append(line['ms'])
for i in range(len(lines)):
new_line = lines[i]
if new_line['ms'] in checked: continue
if new_line['e_up'] == line['e_up'] and new_line['e_down'] == line['e_down'] and new_line['f_up'] == line['f_up'] and new_line['f_down'] == line['f_down'] and new_line['l_up'] == line['l_up'] and new_line['l_down'] == line['l_down'] and new_line['r_up'] == line['r_up'] and new_line['r_down'] == line['r_down'] and new_line['b'] == line['b'] and new_line['a'] == line['a'] and new_line['c'] == line['c'] and new_line['d'] == line['d']:
duplicate.append(new_line['ms'])
checked.append(new_line['ms'])
duplicates.append(duplicate)
return duplicates, checked
然后我在潜在重复(行)数组的每个 non-checked 索引上使用了上述函数。
duplicates = list()
checked = list()
for i in range(len(lines)):
duplicates, checked = find_duplicate(lines, lines[i], duplicates, checked)
print(duplicates)
代码输入:
(1902, False, False, False, False, False, False, True, False, 128, -37, -127, -20)
(1843, False, False, False, False, True, False, False, False, 0, 13, -13, 0)
(1932, False, False, False, False, True, False, False, False, 0, 13, -13, 0)
(1847, False, True, False, False, True, False, False, False, 0, 13, -13, 0)
(1869, False, True, False, False, True, False, False, False, 0, 13, -13, 0)
输出:
[[1902], [1843, 1932], [1847, 1869]]
我有一个行数组,每行由以下内容表示:
{
'ms': int,
'e_up': bool,
'e_down': bool,
'f_up': bool,
'f_down': bool,
'l_up': bool,
'l_down': bool,
'r_up': bool,
'r_down': bool,
'b': int,
'a': int,
'c': int,
'd': int
}
我想遍历所有行(行数组,作为字典)并找到所有重复项及其 .ms 属性。
例如,如果我有:
(1902, False, False, False, False, False, False, True, False, 128, -37, -127, -20)
(1843, False, False, False, False, True, False, False, False, 0, 13, -13, 0)
(1968, False, False, False, False, True, False, False, False, 0, 13, -13, 0)
(234, False, False, False, False, True, False, False, False, 0, 13, -13, 0)
(0, False, False, False, False, True, False, False, False, 0, 13, -13, 0)
我希望输出为:
[
[
1843,
1968,
234,
0
]
]
我想找到所有可能的组合,时间在这里不是问题,如果花费额外的时间对我来说并不重要。如何使用 Python 完成此操作? (请不要使用外部库)
您可以利用元组可以用作字典中的键这一事实。下面的代码使用 'ms' 以外的值的元组作为字典中的键, 'ms' 值在字典中保存为列表。任何包含 2 个或更多值的列表表示重复:
itemlist = list()
itemlist.append((1902, False, False, False, False, False, False, True, False, 128, -37, -127, -20))
itemlist.append((1843, False, False, False, False, True, False, False, False, 0, 13, -13, 0))
itemlist.append((1968, False, False, False, False, True, False, False, False, 0, 13, -13, 0))
itemlist.append((234, False, False, False, False, True, False, False, False, 0, 13, -13, 0))
itemlist.append((0, False, False, False, False, True, False, False, False, 0, 13, -13, 0))
itemdict = dict()
# create dictionary with lists of items according to signature
for item in itemlist:
key = item[1:]
if key in itemdict:
itemdict[key].append(item[0])
else:
itemdict[key] = [item[0]]
# iterate over dictionary and find items with more than one occurence
duplicates = []
for value in itemdict.values():
if len(value)>1:
duplicates.extend(value)
print(duplicates)
我解决问题的方法是搜索每个索引与数组的每个其他 non-checked 索引并找到重复项。
def find_duplicate(lines, line, duplicates, checked):
if (line['ms'] in checked):
return duplicates, checked
duplicate = list()
duplicate.append(line['ms'])
checked.append(line['ms'])
for i in range(len(lines)):
new_line = lines[i]
if new_line['ms'] in checked: continue
if new_line['e_up'] == line['e_up'] and new_line['e_down'] == line['e_down'] and new_line['f_up'] == line['f_up'] and new_line['f_down'] == line['f_down'] and new_line['l_up'] == line['l_up'] and new_line['l_down'] == line['l_down'] and new_line['r_up'] == line['r_up'] and new_line['r_down'] == line['r_down'] and new_line['b'] == line['b'] and new_line['a'] == line['a'] and new_line['c'] == line['c'] and new_line['d'] == line['d']:
duplicate.append(new_line['ms'])
checked.append(new_line['ms'])
duplicates.append(duplicate)
return duplicates, checked
然后我在潜在重复(行)数组的每个 non-checked 索引上使用了上述函数。
duplicates = list()
checked = list()
for i in range(len(lines)):
duplicates, checked = find_duplicate(lines, lines[i], duplicates, checked)
print(duplicates)
代码输入:
(1902, False, False, False, False, False, False, True, False, 128, -37, -127, -20)
(1843, False, False, False, False, True, False, False, False, 0, 13, -13, 0)
(1932, False, False, False, False, True, False, False, False, 0, 13, -13, 0)
(1847, False, True, False, False, True, False, False, False, 0, 13, -13, 0)
(1869, False, True, False, False, True, False, False, False, 0, 13, -13, 0)
输出:
[[1902], [1843, 1932], [1847, 1869]]