如果键字符串中的任何字符匹配，则对字典的值求和

Question

假设我有一个包含字符串键和整数值的字典，例如：

{'255': 8,
 '323': 1,
 '438': 1,
 '938': 2,
 '166': 1,
 '117': 10,
 '777': 2
}

我想创建一个包含以下信息的新词典：

keys = 一些新的唯一字符串（可以是任何东西）
values = 上述字典中所有键的 sum 值，这些键与任何 any 字符匹配匹配的键。

在上面的示例中，结果类似于：

{
 'group1' : 12, # Sum of key values for '255' (8), '323' (1), '438' (1), '938' (2)
 'group2' : 13  # Sum of key values for '166' (1), '117' (10), '777' (2)
}

澄清一下，group1 是通过以下方式创建的：255 有一个 2 与 323 中的 2 相匹配。然后 323 中的 3 也匹配 438 中的 3。最后，438 中的 8（或 3）匹配 938 中的 8（或 3）。因此，以相同的顺序添加这些键的值，我们得到 8 + 1 + 1 + 2 = 12.

None 的上述键值 (group1) 与其余键值合并（对于键 166、117、777；组成 group2) 因为 group1 键中的 none 个字符匹配 group2 键中的任何个字符。

Group2 的创建方式相同，将 166 中的 1 匹配到 117 中的 1，并将 117 中的 7 匹配到 777 中的 7 .

最后的笔记：

只有一个字符需要与要包含在“组”中的键之间的至少一个其他单个字符相匹配
我刚刚解释的上述值总和的顺序应该无关紧要，结果应该从该组中的任何一对开始工作
关键字符总是数字
提示输出字典的键可以是任何东西。为了方便起见，我使用了 group1 和 group2

完成此任务的有效方法是什么？

我考虑过将键值转换为 numpy 矩阵，其中行代表单个键，列代表每个字符的值。然而，沿着这条路走下去很快就会变得相当复杂，如果有更好的方法，我宁愿避免它。

Answer 1

尝试：

d = {"255": 8, "323": 1, "438": 1, "938": 2, "166": 1, "117": 10, "777": 2}

keys, groups = list(d), {}
while keys:
    current = keys.pop()
    s_current = set(current)
    for k in groups:
        if s_current.intersection(k):
            groups[k].append(d[current])
            groups[current] = groups[k]
            break
    else:
        groups[current] = [d[current]]

out = {}
for k, v in groups.items():
    if id(v) not in out:
        out[id(v)] = sum(v)


print(out)

打印：

{140318922050368: 13, 140318911221184: 12}

一点解释：

我们创建了一个临时字典 groups，其中键来自原始字典 d，值是原始字典中与键共享一个字符的值列表。注意：列表在键之间共享 - 这些列表的总数等于不同组的总数。

Answer 2

您可以使用 union-find data structure to solve this problem. The networkx package provides an implementation, but there's nothing stopping you from writing your own.

本质上，我们维护了一组不相交的集合。最初，每个字符串都属于它自己的不相交集。对于每对字符串，如果它们有共同的字母，我们将它们所属的不相交集合合并。这最终为我们提供了我们正在寻找的组。

从这里开始，我们使用 .to_sets() 方法进行分组，并计算所需的总和：

from networkx.utils.union_find import UnionFind

data = # dictionary in question, omitted for brevity
keys = list(data.keys())

uf = UnionFind(data.keys())

for outer_idx in range(len(keys)):
    for inner_idx in range(outer_idx + 1, len(keys)):
        if set(keys[outer_idx]) & set(keys[inner_idx]):
            uf.union(keys[outer_idx], keys[inner_idx])

result = {}
for idx, group in enumerate(uf.to_sets()):
    result[idx] = sum(data[key] for key in group)

print(result)

这输出：

{0: 12, 1: 13}

Answer 3

这与@BrokenBenchmark 的答案类似，但它是 self-contained，并且它还首先按数字分组，因此没有 O(n^2) doubly-nested 遍历所有字符串.

from collections import defaultdict

def get_edges(strings):
    # group together common digits.
    by_digit = [[] for _ in range(10)]
    for i, s in enumerate(strings):
        for digit in map(int, s):
            by_digit[digit].append(i)

    # attach the first of each group to the rest.
    for digit_locations in by_digit:
        if digit_locations:
            x = digit_locations[0]
            for y in digit_locations[1:]:
                yield (x, y)

def union_find_groups(edges, n):
    parent = list(range(n))
    size = [1] * n

    def find(x):
        # find (path-splitting)
        while parent[x] != x:
            x, parent[x] = parent[x], parent[parent[x]]
        return x

    def union(x, y):
        x = find(x)
        y = find(y)
        if x == y:
            return
        # Union (by size)
        if size[x] < size[y]:
            x, y = y, x
        parent[y] = x
        size[x] += size[y]

    for x, y in edges:
        union(x, y)
    
    result = defaultdict(list)
    for x in range(n):
        result[find(x)].append(x)

    return result.values()

def main(input_data):
    strings = list(input_data.keys())
    index_groups = union_find_groups(get_edges(strings), len(strings))
    string_groups = [[strings[i] for i in group] for group in index_groups]
    print(string_groups)
    results = [sum(input_data[s] for s in group) for group in string_groups]
    return {f"group{i}": total for i, total in enumerate(results)}

if __name__ == "__main__":
    result = main({
        '255': 8,
        '323': 1,
        '438': 1,
        '938': 2,
        '166': 1,
        '117': 10,
        '777': 2
    })
    print(result)

结果：

[['255', '323', '438', '938'], ['166', '117', '777']]
{'group0': 12, 'group1': 13}

如果键字符串中的任何字符匹配，则对字典的值求和

Sum values of dictionary if any character within key strings match

python

string

dictionary

substring