在数组中查找重复项并使用 Python 查找它们的值(深度搜索)

Find duplicates in an array and find their values (deep search) with Python

我有一个行数组,每行由以下内容表示:

{
  'ms': int,
  'e_up': bool,
  'e_down': bool,
  'f_up': bool,
  'f_down': bool,
  'l_up': bool,
  'l_down': bool,
  'r_up': bool,
  'r_down': bool,
  'b': int,
  'a': int,
  'c': int,
  'd': int
}

我想遍历所有行(行数组,作为字典)并找到所有重复项及其 .ms 属性。

例如,如果我有:

(1902, False, False, False, False, False, False, True, False, 128, -37, -127, -20)

(1843, False, False, False, False, True, False, False, False, 0, 13, -13, 0)

(1968, False, False, False, False, True, False, False, False, 0, 13, -13, 0)

(234, False, False, False, False, True, False, False, False, 0, 13, -13, 0)

(0, False, False, False, False, True, False, False, False, 0, 13, -13, 0)

我希望输出为:

[
  [
    1843,
    1968,
    234,
    0
  ]
]

我想找到所有可能的组合,时间在这里不是问题,如果花费额外的时间对我来说并不重要。如何使用 Python 完成此操作? (请不要使用外部库)

您可以利用元组可以用作字典中的键这一事实。下面的代码使用 'ms' 以外的值的元组作为字典中的键, 'ms' 值在字典中保存为列表。任何包含 2 个或更多值的列表表示重复:

itemlist = list()
itemlist.append((1902, False, False, False, False, False, False, True, False, 128, -37, -127, -20))
itemlist.append((1843, False, False, False, False, True, False, False, False, 0, 13, -13, 0))
itemlist.append((1968, False, False, False, False, True, False, False, False, 0, 13, -13, 0))
itemlist.append((234, False, False, False, False, True, False, False, False, 0, 13, -13, 0))
itemlist.append((0, False, False, False, False, True, False, False, False, 0, 13, -13, 0))

itemdict = dict()
# create dictionary with lists of items according to signature
for item in itemlist:
    key = item[1:]
    if key in itemdict:
        itemdict[key].append(item[0])
    else:
        itemdict[key] = [item[0]]

# iterate over dictionary and find items with more than one occurence
duplicates = []
for value in itemdict.values():
    if len(value)>1:
        duplicates.extend(value)

print(duplicates)

我解决问题的方法是搜索每个索引与数组的每个其他 non-checked 索引并找到重复项。

def find_duplicate(lines, line, duplicates, checked):
    if (line['ms'] in checked):
        return duplicates, checked

    duplicate = list()
    duplicate.append(line['ms'])

    checked.append(line['ms'])
    for i in range(len(lines)):
        new_line = lines[i]
        if new_line['ms'] in checked: continue
        if new_line['e_up'] == line['e_up'] and new_line['e_down'] == line['e_down'] and new_line['f_up'] == line['f_up'] and new_line['f_down'] == line['f_down'] and new_line['l_up'] == line['l_up'] and new_line['l_down'] == line['l_down'] and new_line['r_up'] == line['r_up'] and new_line['r_down'] == line['r_down'] and new_line['b'] == line['b'] and new_line['a'] == line['a'] and new_line['c'] == line['c'] and new_line['d'] == line['d']:
            duplicate.append(new_line['ms'])
            checked.append(new_line['ms'])

    duplicates.append(duplicate)

    return duplicates, checked

然后我在潜在重复(行)数组的每个 non-checked 索引上使用了上述函数。

duplicates = list()
checked = list()

for i in range(len(lines)):
    duplicates, checked = find_duplicate(lines, lines[i], duplicates, checked)

print(duplicates)

代码输入:

(1902, False, False, False, False, False, False, True, False, 128, -37, -127, -20)
(1843, False, False, False, False, True, False, False, False, 0, 13, -13, 0)
(1932, False, False, False, False, True, False, False, False, 0, 13, -13, 0)
(1847, False, True, False, False, True, False, False, False, 0, 13, -13, 0)
(1869, False, True, False, False, True, False, False, False, 0, 13, -13, 0)

输出: [[1902], [1843, 1932], [1847, 1869]]