如何在 python 的列表中合并相似元素?
How do I combine similar elements in a list in python?
作为练习,我尝试从图像生成调色板,并使用 Python 将原始提取的 RGB 通道转换为调色板。使用列表和字典的组合非常简单,但我希望能够通过组合相似的颜色通道来限制调色板中的颜色数量,除非它们在图像中都有大量存在。
假设我有一本包含我的 RGB 值及其计数的字典:
countsR = {"255": 301, "250": 16, "10": 589, "14": 124, "8": 132}
此调色板需要 3 位索引。理想情况下,我会 运行 一些函数,combine_channels(dd, max_distance)
,它会做一些可怕的 O(N^2)
事情,并输出如下内容:
print(combine_channels(countsR, 10))
>>> {"255": 317, "10": 845}
现在只能用一位索引!
如果它能跟踪它用什么替换了哪些东西,也许在另一本字典中也很好:
countsR, replacements = combine_channels(countsR, 10)
print(replacements)
>>> {"255": ["250"], "10": ["8", "14"]}
有什么想法吗?
在许多情况下,如评论所示,有多个不同的分组选项。一个简单的选择是按数字顺序遍历通道形成组,如果通道不能适合现有组,则创建一个新通道。它不会导致最大距离,但会保证生成最少的组数:
def combine_channels(channels, dist):
result = {}
replacements = {}
groups = []
group = []
key = None
# Iterate through channels in ascending numerical order
for channel, count in sorted((int(k), v) for k, v in channels.items()):
# Add new group in case that channel doesn't fit to current group
if group and channel - key > dist:
groups.append((key, group))
group = []
key = None
# Add channel to group
group.append((channel, count))
# Pick a new key in case there's none or current channel is within
# dist from first channel in the group
if key is None or channel - group[0][0] <= dist:
key = channel
# Add last group in case it exists
if group:
groups.append((key, group))
for key, group in groups:
result[key] = sum(x[1] for x in group)
replacements[key] = [x[0] for x in group if x[0] != key]
return result, replacements
countsR1 = {"255": 301, "250": 16, "10": 589, "14": 124, "8": 132}
countsR2 = {"0": 10, "11": 20, "7": 30, "19": 40, "25": 50}
countsR3 = {"0": 5, "11": 10}
print(combine_channels(countsR1, 10))
print(combine_channels(countsR2, 10))
print(combine_channels(countsR3, 10))
输出:
({14: 845, 255: 317}, {14: [8, 10], 255: [250]})
({25: 90, 7: 60}, {25: [19], 7: [0, 11]})
({0: 5, 11: 10}, {0: [], 11: []})
上面的时间复杂度是 O(n log n) 因为使用了排序。
这是我想出的:
def combine_channels(lst, max_dist):
colors = list(set(lst)) # unique colors
counts = dict()
replacements = dict()
all_repl = []
for el in lst:
counts[el] = counts.get(el, 0) + 1
N = len(colors)
dists = np.zeros((N, N))
for i in range(N - 1):
for j in range(i + 1, N):
dist = abs(colors[i] - colors[j])
dists[i, j] = dist
if(dist < max_dist):
if(colors[i] in all_repl or colors[j] in all_repl):
continue
else:
if(counts.get(colors[i], 0) > counts.get(colors[j], 0)):
winner = colors[i]
loser = colors[j]
else:
winner = colors[j]
loser = colors[i]
counts[winner] += counts.get(loser, 0)
counts[loser] = 0
if winner not in replacements:
replacements[winner] = list()
replacements[winner].append(loser)
all_repl.append(loser)
if loser not in replacements:
continue
else:
replacements[winner] = replacements[winner].extend(replacements[loser])
replacements.pop(loser, None)
print(replacements)
可能有很多更有效的方法,它不满足最大距离要求,但它有效!
作为练习,我尝试从图像生成调色板,并使用 Python 将原始提取的 RGB 通道转换为调色板。使用列表和字典的组合非常简单,但我希望能够通过组合相似的颜色通道来限制调色板中的颜色数量,除非它们在图像中都有大量存在。
假设我有一本包含我的 RGB 值及其计数的字典:
countsR = {"255": 301, "250": 16, "10": 589, "14": 124, "8": 132}
此调色板需要 3 位索引。理想情况下,我会 运行 一些函数,combine_channels(dd, max_distance)
,它会做一些可怕的 O(N^2)
事情,并输出如下内容:
print(combine_channels(countsR, 10))
>>> {"255": 317, "10": 845}
现在只能用一位索引!
如果它能跟踪它用什么替换了哪些东西,也许在另一本字典中也很好:
countsR, replacements = combine_channels(countsR, 10)
print(replacements)
>>> {"255": ["250"], "10": ["8", "14"]}
有什么想法吗?
在许多情况下,如评论所示,有多个不同的分组选项。一个简单的选择是按数字顺序遍历通道形成组,如果通道不能适合现有组,则创建一个新通道。它不会导致最大距离,但会保证生成最少的组数:
def combine_channels(channels, dist):
result = {}
replacements = {}
groups = []
group = []
key = None
# Iterate through channels in ascending numerical order
for channel, count in sorted((int(k), v) for k, v in channels.items()):
# Add new group in case that channel doesn't fit to current group
if group and channel - key > dist:
groups.append((key, group))
group = []
key = None
# Add channel to group
group.append((channel, count))
# Pick a new key in case there's none or current channel is within
# dist from first channel in the group
if key is None or channel - group[0][0] <= dist:
key = channel
# Add last group in case it exists
if group:
groups.append((key, group))
for key, group in groups:
result[key] = sum(x[1] for x in group)
replacements[key] = [x[0] for x in group if x[0] != key]
return result, replacements
countsR1 = {"255": 301, "250": 16, "10": 589, "14": 124, "8": 132}
countsR2 = {"0": 10, "11": 20, "7": 30, "19": 40, "25": 50}
countsR3 = {"0": 5, "11": 10}
print(combine_channels(countsR1, 10))
print(combine_channels(countsR2, 10))
print(combine_channels(countsR3, 10))
输出:
({14: 845, 255: 317}, {14: [8, 10], 255: [250]})
({25: 90, 7: 60}, {25: [19], 7: [0, 11]})
({0: 5, 11: 10}, {0: [], 11: []})
上面的时间复杂度是 O(n log n) 因为使用了排序。
这是我想出的:
def combine_channels(lst, max_dist):
colors = list(set(lst)) # unique colors
counts = dict()
replacements = dict()
all_repl = []
for el in lst:
counts[el] = counts.get(el, 0) + 1
N = len(colors)
dists = np.zeros((N, N))
for i in range(N - 1):
for j in range(i + 1, N):
dist = abs(colors[i] - colors[j])
dists[i, j] = dist
if(dist < max_dist):
if(colors[i] in all_repl or colors[j] in all_repl):
continue
else:
if(counts.get(colors[i], 0) > counts.get(colors[j], 0)):
winner = colors[i]
loser = colors[j]
else:
winner = colors[j]
loser = colors[i]
counts[winner] += counts.get(loser, 0)
counts[loser] = 0
if winner not in replacements:
replacements[winner] = list()
replacements[winner].append(loser)
all_repl.append(loser)
if loser not in replacements:
continue
else:
replacements[winner] = replacements[winner].extend(replacements[loser])
replacements.pop(loser, None)
print(replacements)
可能有很多更有效的方法,它不满足最大距离要求,但它有效!