Python 对子列表的密集列表进行排序和合并

Python sort and merge with dense list of sublisted lists

密集列表矩阵由以下内容组成:元组列表的子列表列表。

为简单起见-仅讨论第一个 'sublist of tuple lists':

[[[(390,200),(380,206)],[(381,203),(368,209)],[(359,204),(343,211)],
[(308,197),(284,203)],[(331,196),(303,201)],[(359,196),(351,198)],
[(381,197),(377,203)],[(380,206),(365,213)],[(368,209),(352,215)],
[(343,211),(325,217)],[(284,203),(264,209)],[(303,201),(281,207)],
[(351,198),(322,201)], [(377,203),(364,210)]],...

所需要的,是基于(x,y)坐标的每个元组列表的第一个和最后一个索引的(首选)排序和合并;本质上是从线段对创建连续的线段。

理想情况下,上面的输出将表示为:

[[[(390,200),(380,206),(380,206),(365,213)],[(381,203),(368,209),(368,209),
(352,215)],[(359,204),(343,211),(343,211),(325,217)],[(308,197),(284,203),
(284,203),(264,209)],[(331,196),(303,201),(303,201),(281,207)],[(359,196),
(351,198),(351,198),(322,201)],[(381,197),(377,203),(377,203),
(364,210)]],...

... 排序和合并后的所有重复项都将减少...

[[[(390,200),(380,206),(365,213)],[(381,203),(368,209),(352,215)],
[(359,204),(343,211),(325,217)],[(308,197),(284,203),(264,209)],[(331,196),
(303,201),(281,207)],[(359,196),(351,198),(322,201)],[(381,197),(377,203),
(364,210)]],...

尝试过创建重复列表,并使用循环确实没有接近看起来相对简单的排序和合并(对于比我更高级的人)。

更新:

输入数据的过度简化(上图)可能会使解决方案比所需的更简单 - 因此以下是矩阵的完整表示:

[[(443390,4.362e+06), (443390,4.362e+06), (443388,4.36202e+06), 
(443385,4.36204e+06), (443381,4.36206e+06), (443380,4.36206e+06)], 
[(443381,4.36203e+06), (443379,4.36204e+06), (443377,4.36205e+06), 
(443375,4.36206e+06), (443370,4.36208e+06), (443368,4.36209e+06)], 
[(443359,4.36204e+06), (443357,4.36205e+06), (443354,4.36206e+06), 
(443349,4.36208e+06), (443344,4.3621e+06), (443343,4.36211e+06)], 
[(443308,4.36197e+06), (443305,4.36198e+06), (443297,4.362e+06), 
(443295,4.362e+06), (443287,4.36202e+06), (443284,4.36203e+06)], 
[(443331,4.36196e+06), (443329,4.36196e+06), (443317,4.36198e+06), 
(443317,4.36198e+06), (443305,4.362e+06), (443303,4.36201e+06)], 
[(443359,4.36196e+06), (443357,4.36196e+06), (443357,4.36196e+06), 
(443351,4.36198e+06)], [(443381,4.36197e+06), (443380,4.36198e+06), 
(443380,4.362e+06), (443379,4.36202e+06), (443377,4.36203e+06)], 
[(443380,4.36206e+06), (443377,4.36208e+06), (443376,4.36208e+06), 
(443371,4.3621e+06), (443366,4.36212e+06), (443365,4.36213e+06)], 
[(443368,4.36209e+06), (443365,4.3621e+06), (443360,4.36212e+06), 
(443357,4.36213e+06), (443354,4.36214e+06), (443352,4.36215e+06)], 
[(443343,4.36211e+06), (443339,4.36212e+06), (443337,4.36213e+06), 
(443333,4.36214e+06), (443327,4.36216e+06), (443325,4.36217e+06)], 
[(443284,4.36203e+06), (443281,4.36204e+06), (443277,4.36205e+06), 
(443274,4.36206e+06), (443267,4.36208e+06), (443264,4.36209e+06)], 
[(443303,4.36201e+06), (443297,4.36202e+06), (443297,4.36202e+06), 
(443291,4.36204e+06), (443284,4.36206e+06), (443281,4.36207e+06)], 
[(443351,4.36198e+06), (443348,4.36198e+06), (443337,4.36199e+06), 
(443327,4.362e+06), (443322,4.36201e+06)], [(443377,4.36203e+06), 
(443377,4.36203e+06), (443376,4.36204e+06), (443372,4.36206e+06), 
(443367,4.36208e+06), (443364,4.3621e+06)]],...

进一步更新: Martijn Peters 提供了可能是答案的方向,但是为坐标分配实际值而不仅仅是压缩显示值表示比较必须更多地参与(我想?)。

以下命令:

pathGroup = [[(443390,4.362e+06), (443390,4.362e+06), (443388,4.36202e+06), 
(443385,4.36204e+06), (443381,4.36206e+06), (443380,4.36206e+06)], 
[(443381,4.36203e+06), (443379,4.36204e+06), (443377,4.36205e+06), 
(443375,4.36206e+06), (443370,4.36208e+06), (443368,4.36209e+06)], 
[(443359,4.36204e+06), (443357,4.36205e+06), (443354,4.36206e+06), 
(443349,4.36208e+06), (443344,4.3621e+06), (443343,4.36211e+06)], 
[(443308,4.36197e+06), (443305,4.36198e+06), (443297,4.362e+06), 
(443295,4.362e+06), (443287,4.36202e+06), (443284,4.36203e+06)], 
[(443331,4.36196e+06), (443329,4.36196e+06), (443317,4.36198e+06), 
(443317,4.36198e+06), (443305,4.362e+06), (443303,4.36201e+06)], 
[(443359,4.36196e+06), (443357,4.36196e+06), (443357,4.36196e+06), 
(443351,4.36198e+06)], [(443381,4.36197e+06), (443380,4.36198e+06), 
(443380,4.362e+06), (443379,4.36202e+06), (443377,4.36203e+06)], 
[(443380,4.36206e+06), (443377,4.36208e+06), (443376,4.36208e+06), 
(443371,4.3621e+06), (443366,4.36212e+06), (443365,4.36213e+06)], 
[(443368,4.36209e+06), (443365,4.3621e+06), (443360,4.36212e+06), 
(443357,4.36213e+06), (443354,4.36214e+06), (443352,4.36215e+06)], 
[(443343,4.36211e+06), (443339,4.36212e+06), (443337,4.36213e+06), 
(443333,4.36214e+06), (443327,4.36216e+06), (443325,4.36217e+06)], 
[(443284,4.36203e+06), (443281,4.36204e+06), (443277,4.36205e+06), 
(443274,4.36206e+06), (443267,4.36208e+06), (443264,4.36209e+06)], 
[(443303,4.36201e+06), (443297,4.36202e+06), (443297,4.36202e+06), 
(443291,4.36204e+06), (443284,4.36206e+06), (443281,4.36207e+06)], 
[(443351,4.36198e+06), (443348,4.36198e+06), (443337,4.36199e+06), 
(443327,4.362e+06), (443322,4.36201e+06)], [(443377,4.36203e+06), 
(443377,4.36203e+06), (443376,4.36204e+06), (443372,4.36206e+06), 
(443367,4.36208e+06), (443364,4.3621e+06)]]
new = [self.sorted_and_merged(tupList) for tupList in pathGroup]
print new

#...from outside method sorted_and_merged()...
return [t for t in sorted(tuplelist) if not (t[0], format(t[1], '.3e'))
        in seen or seen_add((t[0], format(t[1], '.3e')))]

结果:

[[(443380,4.36206e+06), (443381,4.36206e+06), (443385,4.36204e+06), 
(443388,4.36202e+06), (443390,4.362e+06), (443390,4.362e+06)], 
[(443368,4.36209e+06), (443370,4.36208e+06), (443375,4.36206e+06), 
(443377,4.36205e+06), (443379,4.36204e+06), (443381,4.36203e+06)], 
[(443343,4.36211e+06), (443344,4.3621e+06), (443349,4.36208e+06), 
(443354,4.36206e+06), (443357,4.36205e+06), (443359,4.36204e+06)], 
[(443284,4.36203e+06), (443287,4.36202e+06), (443295,4.362e+06), 
(443297,4.362e+06), (443305,4.36198e+06), (443308,4.36197e+06)], 
[(443303,4.36201e+06), (443305,4.362e+06), (443317,4.36198e+06), 
(443317,4.36198e+06), (443329,4.36196e+06), (443331,4.36196e+06)], 
[(443351,4.36198e+06), (443357,4.36196e+06), (443357,4.36196e+06), 
(443359,4.36196e+06)], [(443377,4.36203e+06), (443379,4.36202e+06), 
(443380,4.362e+06), (443380,4.36198e+06), (443381,4.36197e+06)], 
[(443365,4.36213e+06), (443366,4.36212e+06), (443371,4.3621e+06), 
(443376,4.36208e+06), (443377,4.36208e+06), (443380,4.36206e+06)], 
[(443352,4.36215e+06), (443354,4.36214e+06), (443357,4.36213e+06), 
(443360,4.36212e+06), (443365,4.3621e+06), (443368,4.36209e+06)], 
[(443325,4.36217e+06), (443327,4.36216e+06), (443333,4.36214e+06), 
(443337,4.36213e+06), (443339,4.36212e+06), (443343,4.36211e+06)], 
[(443264,4.36209e+06), (443267,4.36208e+06), (443274,4.36206e+06), 
(443277,4.36205e+06), (443281,4.36204e+06), (443284,4.36203e+06)], 
[(443281,4.36207e+06), (443284,4.36206e+06), (443291,4.36204e+06), 
(443297,4.36202e+06), (443297,4.36202e+06), (443303,4.36201e+06)], 
[(443322,4.36201e+06), (443327,4.362e+06), (443337,4.36199e+06), 
(443348,4.36198e+06), (443351,4.36198e+06)], [(443364,4.3621e+06), 
(443367,4.36208e+06), (443372,4.36206e+06), (443376,4.36204e+06), 
(443377,4.36203e+06), (443377,4.36203e+06)]]

好像有些东西被盗用了。

确实是简单的排序合并;使用 sorted()removing duplicates while keeping the list ordered 你会得到:

def sorted_and_merged(tuplelist):
    seen = set()
    seen_add = seen.add
    return [t for t in sorted(tuplelist, reverse=True) if not (t in seen or seen_add(t))]

将此应用于每个子列表:

[sorted_and_merged(sublist) for sublist in completelist]

演示:

>>> sample = [[(390,200),(380,206),(380,206),(365,213)],[(381,203),(368,209),(368,209),
... (352,215)],[(359,204),(343,211),(343,211),(325,217)],[(308,197),(284,203),
... (284,203),(264,209)],[(331,196),(303,201),(303,201),(281,207)],[(359,196),
... (351,198),(351,198),(322,201)],[(381,197),(377,203),(377,203),
... (364,210)]]
>>> def sorted_and_merged(tuplelist):
...     seen = set()
...     seen_add = seen.add
...     return [t for t in sorted(tuplelist, reverse=True) if not (t in seen or seen_add(t))]
... 
>>> from pprint import pprint
>>> pprint([sorted_and_merged(sublist) for sublist in sample])
[[(390, 200), (380, 206), (365, 213)],
 [(381, 203), (368, 209), (352, 215)],
 [(359, 204), (343, 211), (325, 217)],
 [(308, 197), (284, 203), (264, 209)],
 [(331, 196), (303, 201), (281, 207)],
 [(359, 196), (351, 198), (322, 201)],
 [(381, 197), (377, 203), (364, 210)]]