过滤范围内的整数列表,以排除 python 中的子集
Filtering a list of integer in range, to exclude the subsets in python
我正在尝试找到一种更快的方法来过滤我的范围列表,以便排除任何可以被更大范围完全覆盖的范围。例如,
#all ranges have width >1, which means no such case like xx=[1,1] in my list
#each range itself is sorted. E.g. no such case like [1,3,2]. It is already like [1,2,3]
#each range only contains continuous integers. E.g. no such case like [3,5,7], it will only be like [3,4,5,6,7]. In fact, you could simply consider the first and last integer of the range to know the whole range.
aa=[1,2,3]
bb=[2,3,4]
cc=[1,2]
dd=[0,1,2]
RangeList=[aa,bb,cc,dd]
#FinalList=[aa,bb,dd]
cc 可以被 aa 或 dd 覆盖(我认为它是一个子集),所以我想排除它。我绝对可以为 n^2 比较编写一个循环,但我希望有一种更快的方法,因为我有很多这样的范围。
可以先排序解决:
import operator
ranges=[[0,1,2,3,4], [1,2], [0,1], [2,3,4], [3,4,5], [3,4,5,6], [4,5], [6,7], [5,6]]
sorted_ranges = sorted(ranges,key=operator.itemgetter(-1),reverse=True)
sorted_ranges = sorted(sorted_ranges,key=operator.itemgetter(0))
filtered = []
i,j = 0,0
while i < len(sorted_ranges):
filtered.append(sorted_ranges[i])
j = i+1
while j < len(sorted_ranges) and sorted_ranges[i][-1] >= sorted_ranges[j][-1]:
print "Remove " , sorted_ranges.pop(j) , "dominated by",sorted_ranges[i]
i+=1
print "RESULT",filtered
您需要对第一个元素按升序排序,对最后一个元素按降序排序。
我使用了两次显式调用 sorted 但你可以定义你的 cmp 函数来一次执行此操作:
sorted_ranges = sorted(ranges,cmp=lambda x,y: (x[0]-y[0]) if ((x[0]-y[0]) != 0 ) else (y[-1]-x[-1]))
这样,主导范围将首先出现。
请注意,排序后的嵌套 while 循环的复杂度为 O(n),因为每个元素仅被检查一次,然后被删除或添加到最终集合中。
整个算法的复杂度为O(nlogn)
使用 sets
, issubset()
, filter()
:
ranges = [[0,1,2,3,4], [1,2], [0,1], [2,3,4], [3,4,5], [3,4,5,6], [4,5], [6,7], [5,6]]
# Use 'frozenset' as it is hashable to put in a big 'set'
sets = set([frozenset(a) for a in ranges])
def f(x):
for y in sets:
if x == y:
continue
if x.issubset(y):
return False
return True
result = [list(a) for a in filter(f, sets)]
print 'Result=', result
f
函数过滤掉在输入中找到的任何集合。
Result= [[3, 4, 5, 6], [0, 1, 2, 3, 4], [6, 7]]
虽然还没有 运行 性能测试。
我的第一个想法是:
compressed = dict()
for lst in sorted(RangeList,reverse=True, key= lambda x: (x[0],x[1])):
key = lst[0]
if key not in compressed:
compressed[key] = lst
print compressed.values()
但正如 igon 指出的那样,它遗漏了内部子集。我认为以下内容可以解决这个问题:
RangeList = sorted(RangeList,reverse=True, key= lambda x: (-x[0],x[-1]))
lst = RangeList[0]
oldstart = lst[0]
oldend = lst[-1]
compressed = {oldstart: lst}
for lst in RangeList[1:]:
start = lst[0]
end = lst[-1]
if (start not in compressed and oldend < end):
compressed[start] = lst
oldstart, oldend = start, end
print compressed.values()
这里使用了set
和issubset
,但首先将列表按大小排序,filter
函数再次以相反的顺序遍历输入,试图优化搜索。这可以改进 O() 顺序。
ranges = [[0,1,2,3,4], [1,2], [0,1], [2,3,4], [3,4,5], [3,4,5,6], [4,5], [6,7], [5,6]]
sets = [set(a) for a in ranges]
sets.sort(key=len)
reverse_sets = sets[:]
reverse_sets.reverse()
def f(x):
for y in reverse_sets:
if x == y:
continue
if x.issubset(y):
return False
return True
print 'Result=', [list(a) for a in filter(f, sets)]
结果:
Result= [[6, 7], [3, 4, 5, 6], [0, 1, 2, 3, 4]]
我正在尝试找到一种更快的方法来过滤我的范围列表,以便排除任何可以被更大范围完全覆盖的范围。例如,
#all ranges have width >1, which means no such case like xx=[1,1] in my list
#each range itself is sorted. E.g. no such case like [1,3,2]. It is already like [1,2,3]
#each range only contains continuous integers. E.g. no such case like [3,5,7], it will only be like [3,4,5,6,7]. In fact, you could simply consider the first and last integer of the range to know the whole range.
aa=[1,2,3]
bb=[2,3,4]
cc=[1,2]
dd=[0,1,2]
RangeList=[aa,bb,cc,dd]
#FinalList=[aa,bb,dd]
cc 可以被 aa 或 dd 覆盖(我认为它是一个子集),所以我想排除它。我绝对可以为 n^2 比较编写一个循环,但我希望有一种更快的方法,因为我有很多这样的范围。
可以先排序解决:
import operator
ranges=[[0,1,2,3,4], [1,2], [0,1], [2,3,4], [3,4,5], [3,4,5,6], [4,5], [6,7], [5,6]]
sorted_ranges = sorted(ranges,key=operator.itemgetter(-1),reverse=True)
sorted_ranges = sorted(sorted_ranges,key=operator.itemgetter(0))
filtered = []
i,j = 0,0
while i < len(sorted_ranges):
filtered.append(sorted_ranges[i])
j = i+1
while j < len(sorted_ranges) and sorted_ranges[i][-1] >= sorted_ranges[j][-1]:
print "Remove " , sorted_ranges.pop(j) , "dominated by",sorted_ranges[i]
i+=1
print "RESULT",filtered
您需要对第一个元素按升序排序,对最后一个元素按降序排序。 我使用了两次显式调用 sorted 但你可以定义你的 cmp 函数来一次执行此操作:
sorted_ranges = sorted(ranges,cmp=lambda x,y: (x[0]-y[0]) if ((x[0]-y[0]) != 0 ) else (y[-1]-x[-1]))
这样,主导范围将首先出现。 请注意,排序后的嵌套 while 循环的复杂度为 O(n),因为每个元素仅被检查一次,然后被删除或添加到最终集合中。 整个算法的复杂度为O(nlogn)
使用 sets
, issubset()
, filter()
:
ranges = [[0,1,2,3,4], [1,2], [0,1], [2,3,4], [3,4,5], [3,4,5,6], [4,5], [6,7], [5,6]]
# Use 'frozenset' as it is hashable to put in a big 'set'
sets = set([frozenset(a) for a in ranges])
def f(x):
for y in sets:
if x == y:
continue
if x.issubset(y):
return False
return True
result = [list(a) for a in filter(f, sets)]
print 'Result=', result
f
函数过滤掉在输入中找到的任何集合。
Result= [[3, 4, 5, 6], [0, 1, 2, 3, 4], [6, 7]]
虽然还没有 运行 性能测试。
我的第一个想法是:
compressed = dict()
for lst in sorted(RangeList,reverse=True, key= lambda x: (x[0],x[1])):
key = lst[0]
if key not in compressed:
compressed[key] = lst
print compressed.values()
但正如 igon 指出的那样,它遗漏了内部子集。我认为以下内容可以解决这个问题:
RangeList = sorted(RangeList,reverse=True, key= lambda x: (-x[0],x[-1]))
lst = RangeList[0]
oldstart = lst[0]
oldend = lst[-1]
compressed = {oldstart: lst}
for lst in RangeList[1:]:
start = lst[0]
end = lst[-1]
if (start not in compressed and oldend < end):
compressed[start] = lst
oldstart, oldend = start, end
print compressed.values()
这里使用了set
和issubset
,但首先将列表按大小排序,filter
函数再次以相反的顺序遍历输入,试图优化搜索。这可以改进 O() 顺序。
ranges = [[0,1,2,3,4], [1,2], [0,1], [2,3,4], [3,4,5], [3,4,5,6], [4,5], [6,7], [5,6]]
sets = [set(a) for a in ranges]
sets.sort(key=len)
reverse_sets = sets[:]
reverse_sets.reverse()
def f(x):
for y in reverse_sets:
if x == y:
continue
if x.issubset(y):
return False
return True
print 'Result=', [list(a) for a in filter(f, sets)]
结果:
Result= [[6, 7], [3, 4, 5, 6], [0, 1, 2, 3, 4]]