如何在 python 的列表中合并相似元素?

How do I combine similar elements in a list in python?

作为练习,我尝试从图像生成调色板,并使用 Python 将原始提取的 RGB 通道转换为调色板。使用列表和字典的组合非常简单,但我希望能够通过组合相似的颜色通道来限制调色板中的颜色数量,除非它们在图像中都有大量存在。

假设我有一本包含我的 RGB 值及其计数的字典:

countsR = {"255": 301, "250": 16, "10": 589, "14": 124, "8": 132}

此调色板需要 3 位索引。理想情况下,我会 运行 一些函数,combine_channels(dd, max_distance),它会做一些可怕的 O(N^2) 事情,并输出如下内容:

print(combine_channels(countsR, 10))
>>> {"255": 317, "10": 845}

现在只能用一位索引!

如果它能跟踪它用什么替换了哪些东西,也许在另一本字典中也很好:

countsR, replacements = combine_channels(countsR, 10)
print(replacements)
>>> {"255": ["250"], "10": ["8", "14"]}

有什么想法吗?

在许多情况下,如评论所示,有多个不同的分组选项。一个简单的选择是按数字顺序遍历通道形成组,如果通道不能适合现有组,则创建一个新通道。它不会导致最大距离,但会保证生成最少的组数:

def combine_channels(channels, dist):
    result = {}
    replacements = {}
    groups = []
    group = []
    key = None

    # Iterate through channels in ascending numerical order
    for channel, count in sorted((int(k), v) for k, v in channels.items()):
        # Add new group in case that channel doesn't fit to current group
        if group and channel - key > dist:
            groups.append((key, group))
            group = []
            key = None

        # Add channel to group
        group.append((channel, count))

        # Pick a new key in case there's none or current channel is within
        # dist from first channel in the group
        if key is None or channel - group[0][0] <= dist:
            key = channel

    # Add last group in case it exists
    if group:
        groups.append((key, group))

    for key, group in groups:
        result[key] = sum(x[1] for x in group)
        replacements[key] = [x[0] for x in group if x[0] != key]

    return result, replacements

countsR1 = {"255": 301, "250": 16, "10": 589, "14": 124, "8": 132}
countsR2 = {"0": 10, "11": 20, "7": 30, "19": 40, "25": 50}
countsR3 = {"0": 5, "11": 10}
print(combine_channels(countsR1, 10))
print(combine_channels(countsR2, 10))
print(combine_channels(countsR3, 10))

输出:

({14: 845, 255: 317}, {14: [8, 10], 255: [250]})
({25: 90, 7: 60}, {25: [19], 7: [0, 11]})
({0: 5, 11: 10}, {0: [], 11: []})

上面的时间复杂度是 O(n log n) 因为使用了排序。

这是我想出的:

def combine_channels(lst, max_dist):
    colors = list(set(lst)) # unique colors
    counts = dict()
    replacements = dict()
    all_repl = []

    for el in lst:
        counts[el] = counts.get(el, 0) + 1

    N = len(colors)
    dists = np.zeros((N, N))
    for i in range(N - 1):
        for j in range(i + 1, N):
            dist = abs(colors[i] - colors[j])
            dists[i, j] = dist
            if(dist < max_dist):
                if(colors[i] in all_repl or colors[j] in all_repl):
                    continue
                else:
                    if(counts.get(colors[i], 0) > counts.get(colors[j], 0)):
                        winner = colors[i]
                        loser = colors[j]
                    else:
                        winner = colors[j]
                        loser = colors[i]
                counts[winner] += counts.get(loser, 0)
                counts[loser] = 0
                if winner not in replacements:
                    replacements[winner] = list()
                replacements[winner].append(loser)
                all_repl.append(loser)
                if loser not in replacements:
                    continue
                else:
                    replacements[winner] = replacements[winner].extend(replacements[loser])
                    replacements.pop(loser, None)              
    print(replacements)

可能有很多更有效的方法,它不满足最大距离要求,但它有效!