局部序列略微乱序的最长公共序列

Question

我需要解决两个序列之间的最长公共序列问题，当它们稍微乱序并且乱序的节点彼此靠近时。换句话说，假设其中一个序列是 s1 s2 s3 ... s_n。它可能与 s1 和 s2 交换，或 s1 和 s3 交换，或 s_{n-1} 和 s_n 交换，但 s1 和 s_n 不太可能交换顺序。

当观察到的序列是许多同时生成的序列的混合时，就会出现这种情况。例如，如果我有两个并发源生成 (a1, a2, ..., a100) 和 (b1, b2, ..., b100)。许多节点几乎在同一时间生成，因此我们用于匹配的组合序列最终可能变成 (a1, a2, b1, b2, ... ), 或 (a1, b1, a2, b2 );因为节点 a1、a2、a3、b1、b2、b3 可能在同一纳秒内生成，并且任何顺序都是可能的。但是a1和b100不可能交换订单，因为它们是相隔几秒生成的。

假设有两个这样的序列，它们在全局上是有序的，但在局部可能会有小的重新排列。这种情况下如何应用最长公共序列算法？

Answer 1

很抱歉没有回答。

想法是从每个序列中读入一个暂存池，然后当池中的内容即将到期时，尝试将它们与另一个序列配对。池越大，允许的重排就越大。

class CurrentPool:
    def __init__ (self, items):
        self.items = items
        self.index = -1
        self.pool = {}
        self.empty = False

    def read_into_pool(self):
        self.index += 1
        if self.index < len(self.items):
            self.pool[self.index] = self.items[self.index]
        elif 0 == len(self.pool):
            self.empty = True
        return self.pool

def compare_lists (items1, items2, max_offset=1):
    pool1 = CurrentPool(items1)
    pool2 = CurrentPool(items2)
    trailing_index = -max_offset
    lcs = []
    while not pool1.empty and not pool2.empty:
        temp_pool1 = pool1.read_into_pool()
        temp_pool2 = pool2.read_into_pool()
        # We are forced to take keys if they are at the trailing_index.
        if trailing_index in temp_pool1:
            # Look for a match.
            min_index = None
            for i, val in temp_pool2.items():
                if val == temp_pool1[trailing_index]:
                    if min_index is None or i < min_index:
                        min_index = i
            del temp_pool1[trailing_index]
            if min_index is not None:
                lcs.append(temp_pool2[min_index])
                del temp_pool2[min_index]

        if trailing_index in temp_pool2:
            # Look for a match the other way
            min_index = None
            for i, val in temp_pool1.items():
                if val == temp_pool2[trailing_index]:
                    if min_index is None or i < min_index:
                        min_index = i
            del temp_pool2[trailing_index]
            if min_index is not None:
                lcs.append(temp_pool1[min_index])
                del temp_pool1[min_index]
        else:
            # We only drain if we find a good match.
            pass

        trailing_index += 1
    return lcs


print(compare_lists([1, 2, 3, 4, 5, 6], [2, 1, 4, 5, 7], 2))

局部序列略微乱序的最长公共序列

Longest Common Sequence with Sequences Slightly Out of order Locally

algorithm

sequence

lcs