如何获得一组不重叠的区间的笛卡尔积？

Question

给定一个包含一组区间的字典：

intervals = {'561801/03/08': [[1081, 1156], [1141, 1216], [1201, 1276], [1741, 1816], [1801, 1876], [1861, 1936], [1921, 1996], [1981, 2056], [2041, 2116]], '563301/03/08': [[1170, 1250], [1230, 1310], [1770, 1850], [1830, 1910], [1890, 1970], [1950, 2030], [2010, 2090], [2070, 2150], [2130, 2210]], '688002/03/08': [[1790, 1850], [1850, 1910], [1910, 1970], [1970, 2030], [2090, 2150], [2150, 2210], [2210, 2270], [2270, 2330], [2330, 2390], [2390, 2450], [2450, 2510], [2510, 2570], [2570, 2630], [2630, 2690], [2690, 2750]], '690102/03/08': [[1900, 1960], [1960, 2020], [2020, 2080], [2080, 2140], [2200, 2260], [2260, 2320], [2320, 2380], [2380, 2440], [2440, 2500], [2500, 2560], [2560, 2620], [2620, 2680], [2680, 2740]], '559402/03/08': [[2015, 2090], [2075, 2150], [2135, 2210], [2195, 2270], [2255, 2330], [2315, 2390], [2375, 2450], [2435, 2510], [2495, 2570], [2555, 2630], [2615, 2690], [2675, 2750]], '561302/03/08': [[2310, 2390], [2370, 2450], [2430, 2510], [2490, 2570], [2550, 2630], [2610, 2690], [2670, 2750]], '572602/03/08': [[2435, 2505], [2495, 2565], [2555, 2625], [2615, 2685], [2675, 2745]], '572502/03/08': [[2560, 2640], [2620, 2700]]}

可以使用以下方法获得笛卡尔积：

prod = itertools.product(*intervals)

这个笛卡尔积的大小是9915131275* 2 = 13,267,800

我希望通过不允许两个或多个域重叠的组合来减少它。这个组合可以：

[1081, 1156], [1170, 1250], [1790, 1850], [1900, 1960], [2015, 2090], [2310, 2390], [2435, 2505], [2560, 2640] OK

这个组合不行

[1141, 1216], [1170, 1250], [1790, 1850], [1900, 1960], [2015, 2090], [2310, 2390], [2435, 2505], [2560, 2640] not OK

以及任何以以下开头的组合：

[1141, 1216], [1170, 1250]

不应该考虑。这不包括 15131275*2 = 163,800 个组合目的是显着减少笛卡尔积的大小，只有不重叠的区间。

提前致谢

Answer 1

我对这个问题的解释如下：从每个区间列表中恰好选择一个区间，并且只接受那些没有任何重叠的组合。

下面是 CPMPy (https://github.com/CPMpy/cpmpy) 中的约束规划模型。它使用 Element 约束从第 i 个间隔列表中选择选定的第 x[i] 个间隔。

基本约束是：

~( (starts[i] >= starts[j]) & (starts[i] <= ends[j]))
~( (starts[j] >= starts[i]) & (starts[j] <= ends[i]))

这确保第 i 个选定的间隔不会与第 j 个间隔重叠（反之亦然）。（注：~表示not。）

from cpmpy import *
from cpmpy.solvers import *
from cpmpy_hakank import * # See http://hakank.org/cpmpy/cpmpy_hakank.py

def print_solution(a):
    """
    Print the solution.
    """
    # The selected intervals, as indices in each interval list
    xval = a[0].value()
    n = len(xval)
    print(xval)
    # The selected intervals, as intervals
    sols = [intervals[i][xval[i]] for i in range(n)]
    print(sols)
    print(flush=True)
    

#
# Note: intervals is a list of list of intervals (not a dictionary)
#
def reduce_overlaps(intervals):

    # Convert the list of intervals to a list of flattened lists
    # for use with Element below.
    intervals_flatten = []
    for interval in intervals:
        intervals_flatten.append(cpm_array(flatten_lists(interval)))
    intervals_flatten = cpm_array(intervals_flatten)
    
    # We need all values to create the domains of the selected interval 
    # values
    all_values = flatten_lists(intervals_flatten)
    max_val = max(all_values)
    min_val = min(all_values)
    
    n = len(intervals)
    lens = [len(interval) for interval in intervals]

    #
    # Decision variables
    #
    model = Model()

    # x[i] is the selected interval for the i'th interval list
    x = intvar(0,max(lens),shape=n,name="x")
    
    # Reduce the domain (the possible values) of each interval list
    # (since they have different lengths)
    for i in range(n):
        model += [x[i] < lens[i]]

    # starts[i] is the start value of the i'th selected interval
    starts = intvar(min_val,max_val,shape=n,name="starts")
    # ends[i] is the end value of the i'th selected interval    
    ends   = intvar(min_val,max_val,shape=n,name="ends")

    #
    # Main constraints:
    #  - Pick exactly one of the intervals from each intervals list
    #  - Ensure that there are no overlaps between any of selected intervals.
    #

    # get the values of the selected intervals
    for i in range(n):
        # Use Element to obtain the start and end values of the selected 
        # interval. We have to use the following construct with Element 
        # since CPMPy does not (yet) support this syntax:
        #    starts[i] = intervals[x[i],0]
        #    ends[i]   = intervals[x[i],1]
        model += [starts[i] == Element(intervals_flatten[i],x[i]*2+0), # corresponds to: starts[i] = intervals[x[i],0]
                  ends[i]   == Element(intervals_flatten[i],x[i]*2+1), # corresponds to: ends[i]   = intervals[x[i],1]
                  ]

    # Ensure that the i'th selected interval don't overlap with
    # the rest of the intervals (the j'th interval)
    for i in range(n):
        for j in range(i+1,n):           
            # Ensure that the start value of one interval is not inside the other interval
            model += [~( (starts[i] >= starts[j]) & (starts[i] <= ends[j])),
                      ~( (starts[j] >= starts[i]) & (starts[j] <= ends[i])) ]


    # Print all solutions.
    # This method is defined in http://hakank.org/cpmpy/cpmpy_hakank.py
    # ortools_wrapper(model,[x],print_solution)
    # Collect the solutions in an array
    solutions = []
    def get_solution(a):
        xval = a[0].value()
        # print(xval)
        sol = [intervals[i][xval[i]] for i in range(n)]
        # print("sol:",sol)        
        solutions.append(sol)
    ortools_wrapper2(model,[x],get_solution)
        
    return np.array(solutions)


intervals_dict = {
    '561801/03/08': [[1081, 1156], [1141, 1216], [1201, 1276], [1741, 1816], [1801, 1876], [1861, 1936], [1921, 1996], [1981, 2056], [2041, 2116]],
    '563301/03/08': [[1170, 1250], [1230, 1310], [1770, 1850], [1830, 1910], [1890, 1970], [1950, 2030], [2010, 2090], [2070, 2150], [2130, 2210]],
    '688002/03/08': [[1790, 1850], [1850, 1910], [1910, 1970], [1970, 2030], [2090, 2150], [2150, 2210], [2210, 2270], [2270, 2330], [2330, 2390], [2390, 2450], [2450, 2510], [2510, 2570], [2570, 2630], [2630, 2690], [2690, 2750]],
    '690102/03/08': [[1900, 1960], [1960, 2020], [2020, 2080], [2080, 2140], [2200, 2260], [2260, 2320], [2320, 2380], [2380, 2440], [2440, 2500], [2500, 2560], [2560, 2620], [2620, 2680], [2680, 2740]],
    '559402/03/08': [[2015, 2090], [2075, 2150], [2135, 2210], [2195, 2270], [2255, 2330], [2315, 2390], [2375, 2450], [2435, 2510], [2495, 2570], [2555, 2630], [2615, 2690], [2675, 2750]],
    '561302/03/08': [[2310, 2390], [2370, 2450], [2430, 2510], [2490, 2570], [2550, 2630], [2610, 2690], [2670, 2750]],
    '572602/03/08': [[2435, 2505], [2495, 2565], [2555, 2625], [2615, 2685], [2675, 2745]],
    '572502/03/08': [[2560, 2640], [2620, 2700]]
    }

# Convert to a list of lists since this is needed for the output
intervals = [intervals_dict[a] for a in intervals_dict]
solutions = reduce_overlaps(intervals)
# print("Solutions:",solutions)
print("Num solutions:",len(solutions))

注意：该程序使用我的实用程序包 http://hakank.org/cpmpy/cpmpy_hakank.py .

该模型给出了 12201 个解决方案，显示了所选区间的索引以及区间。以下是其中一些解决方案：

sol #1
[7 7 2 4 5 2 4 0]
[[1981, 2056], [2070, 2150], [1910, 1970], [2200, 2260], [2315, 2390], [2430, 2510], [2675, 2745], [2560, 2640]]


sol #2
[7 7 0 4 5 2 4 0]
[[1981, 2056], [2070, 2150], [1790, 1850], [2200, 2260], [2315, 2390], [2430, 2510], [2675, 2745], [2560, 2640]]


sol #3
[7 7 1 4 5 2 4 0]
[[1981, 2056], [2070, 2150], [1850, 1910], [2200, 2260], [2315, 2390], [2430, 2510], [2675, 2745], [2560, 2640]]

....

sol #12200
[4 8 2 5 0 1 1 1]
[[1801, 1876], [2130, 2210], [1910, 1970], [2260, 2320], [2015, 2090], [2370, 2450], [2495, 2565], [2620, 2700]]


sol #12201
[6 8 1 5 0 1 1 1]
[[1921, 1996], [2130, 2210], [1850, 1910], [2260, 2320], [2015, 2090], [2370, 2450], [2495, 2565], [2620, 2700]]

ExitStatus.OPTIMAL (3.59288788 seconds)
Nr solutions: 12201
Num conflicts: 302
NumBranches: 135035
WallTime: 3.59288788

更新

这里有两个CP模型：

CPMpy模型（与上面略有不同）：http://hakank.org/cpmpy/reduce_overlapping_intervals_cp.py
Picat 模型（我用它来制作问题原型）：http://hakank.org/picat/reduce_overlapping_intervals.pi

更新 2

问题还有另一种解释：从每个区间列表中删除区间，使这些剩余区间列表的任意组合（从每个区间列表中取出一个区间）不重叠。并且我们要求每个区间列表至少保留一个区间。

这样说，那么一共有（根据我的Picat模型，见下）608599种不同的配置。

也许更有趣的是只使用最优解，即具有最大保持间隔数的配置。那么保持间隔的最佳数量是 15（同样根据我的 Picat 模型），并且有 170 个这样的配置。（令我惊讶的是，保持间隔的最佳数量仅为 15 ，它是可能的 72 个间隔中的一小部分）。

以下是其中一些最佳解决方案（保留 15 个间隔）：

interval = 1 = [[1081,1156],[1741,1816],[1801,1876],[1861,1936],[1921,1996],[1981,2056]]
interval = 2 = [[1170,1250],[1230,1310]]
interval = 3 = [[2150,2210],[2210,2270]]
interval = 4 = [[2080,2140]]
interval = 5 = [[2675,2750]]
interval = 6 = [[2310,2390]]
interval = 7 = [[2435,2505]]
interval = 8 = [[2560,2640]]


interval = 1 = [[1081,1156],[1141,1216],[1741,1816],[1801,1876]]
interval = 2 = [[1230,1310]]
interval = 3 = [[2150,2210],[2210,2270]]
interval = 4 = [[1900,1960],[1960,2020],[2020,2080],[2080,2140]]
interval = 5 = [[2435,2510]]
interval = 6 = [[2310,2390]]
interval = 7 = [[2675,2745]]
interval = 8 = [[2560,2640]]


interval = 1 = [[1081,1156],[1741,1816],[1801,1876]]
interval = 2 = [[1170,1250],[1230,1310]]
interval = 3 = [[2090,2150],[2150,2210],[2210,2270]]
interval = 4 = [[1900,1960],[1960,2020],[2020,2080]]
interval = 5 = [[2435,2510]]
interval = 6 = [[2310,2390]]
interval = 7 = [[2675,2745]]
interval = 8 = [[2560,2640]]

此方法的 Picat 模型在这里：http://hakank.org/picat/reduce_overlapping_interval2.pi。

如何获得一组不重叠的区间的笛卡尔积？

How can I get the cartesian product of a set of intervals with no overlapping?

python

cartesian-product

constraint-programming