不是两个字符串的子序列的最短子序列的动态规划

Dynamic Programming for shortest subsequence that is not a subsequence of two strings

问题:给定 '0' 和 '1' 的两个序列 s1 和 s2return 最短的序列既不是这两个序列的子序列。

例如s1 = '011' s2 = '1101' Return s_out = '00' 作为一个可能的结果。

请注意,子串和子序列是不同的,其中子串的字符是连续的,但在子序列中则不必如此。

我的问题:动态规划在下面的"Solution Provided"中是如何应用的,它的时间复杂度是多少?

我的尝试涉及计算每个字符串的所有子序列,给出 sub1 和 sub2。向每个 sub1 附加一个“1”或“0”,并确定新子序列是否不存在于 sub2.Find 最小长度中。这是我的代码:

我的解决方案

def get_subsequences(seq, index, subs, result): 
    if index == len(seq): 
        if subs: 
            result.add(''.join(subs))
    else:
        get_subsequences(seq, index + 1, subs, result)
        get_subsequences(seq, index + 1, subs + [seq[index]], result)

def get_bad_subseq(subseq):
    min_sub = ''
    length = float('inf')
    for sub in subseq:
        for char in ['0', '1']:
            if len(sub) + 1 < length and sub + char not in subseq:
                length = len(sub) + 1
                min_sub = sub + char
    return min_sub

提供的解决方案(不是我的)

它是如何工作的及其时间复杂度?

看起来下面的解决方案类似于:http://kyopro.hateblo.jp/entry/2018/12/11/100507

def set_nxt(s, nxt):
    n = len(s)
    idx_0 = n + 1
    idx_1 = n + 1
    for i in range(n, 0, -1):
        nxt[i][0] = idx_0
        nxt[i][1] = idx_1
        if s[i-1] == '0':
            idx_0 = i
        else:
            idx_1 = i
    nxt[0][0] = idx_0
    nxt[0][1] = idx_1

def get_shortest(seq1, seq2):
    len_seq1 = len(seq1)
    len_seq2 = len(seq2)
    nxt_seq1 = [[len_seq1 + 1 for _ in range(2)] for _ in range(len_seq1 + 2)] 
    nxt_seq2 = [[len_seq2 + 1 for _ in range(2)] for _ in range(len_seq2 + 2)] 

    set_nxt(seq1, nxt_seq1)
    set_nxt(seq2, nxt_seq2)

    INF = 2 * max(len_seq1, len_seq2)
    dp = [[INF for _ in range(len_seq2 + 2)] for _ in range(len_seq1 + 2)]
    dp[len_seq1 + 1][len_seq2 + 1] = 0
    for i in range( len_seq1 + 1, -1, -1):
        for j in range(len_seq2 + 1, -1, -1):
            for k in range(2):
                if dp[nxt_seq1[i][k]][nxt_seq2[j][k]] < INF:
                    dp[i][j] = min(dp[i][j], dp[nxt_seq1[i][k]][nxt_seq2[j][k]] + 1);

    res = ""
    i = 0
    j = 0
    while i <= len_seq1 or j <= len_seq2:
        for k in range(2):
            if (dp[i][j] == dp[nxt_seq1[i][k]][nxt_seq2[j][k]] + 1):
                i = nxt_seq1[i][k]
                j = nxt_seq2[j][k]
                res += str(k)
                break;
    return res

我不打算详细研究它,但这个解决方案的想法是创建一个二维数组,其中包含一个数组和另一个数组中的每个位置组合。然后,它会使用有关它发现的最短序列的信息填充此数组。

仅构建该数组就需要 space(因此需要时间)O(len(seq1) * len(seq2))。填写它需要类似的时间。

这是通过大量我不想跟踪的位操作完成的。

我有另一种对我来说更清楚的方法,它通常花费更少 space 和更少的时间,但在最坏的情况下可能同样糟糕。但是我没有编码。

更新:

这里是全部编码。变量名选择不当。抱歉。

# A trivial data class to hold a linked list for the candidate subsequences
# along with information about they match in the two sequences.
import collections
SubSeqLinkedList = collections.namedtuple('SubSeqLinkedList', 'value pos1 pos2 tail')

# This finds the position after the first match.  No match is treated as off the end of seq.
def find_position_after_first_match (seq, start, value):
    while start < len(seq) and seq[start] != value:
        start += 1
    return start+1

def make_longer_subsequence (subseq, value, seq1, seq2):
    pos1 = find_position_after_first_match(seq1, subseq.pos1, value)
    pos2 = find_position_after_first_match(seq2, subseq.pos2, value)
    gotcha = SubSeqLinkedList(value=value, pos1=pos1, pos2=pos2, tail=subseq)
    return gotcha

def minimal_nonsubseq (seq1, seq2):
    # We start with one candidate for how to start the subsequence
    # Namely an empty subsequence.  Length 0, matches before the first character.
    candidates = [SubSeqLinkedList(value=None, pos1=0, pos2=0, tail=None)]

    # Now we try to replace candidates with longer maximal ones - nothing of
    # the same length is better at going farther in both sequences.
    # We keep this list ordered by descending how far it goes in sequence1.
    while candidates[0].pos1 <= len(seq1) or candidates[0].pos2 <= len(seq2):
        new_candidates = []
        for candidate in candidates:
            candidate1 = make_longer_subsequence(candidate, '0', seq1, seq2)
            candidate2 = make_longer_subsequence(candidate, '1', seq1, seq2)
            if candidate1.pos1 < candidate2.pos1:
                # swap them.
                candidate1, candidate2 = candidate2, candidate1
            for c in (candidate1, candidate2):
                if 0 == len(new_candidates):
                    new_candidates.append(c)
                elif new_candidates[-1].pos1 <= c.pos1 and new_candidates[-1].pos2 <= c.pos2:
                    # We have found strictly better.
                    new_candidates[-1] = c
                elif new_candidates[-1].pos2 < c.pos2:
                    # Note, by construction we cannot be shorter in pos1.
                    new_candidates.append(c)
        # And now we throw away the ones we don't want.
        # Those that are on their way to a solution will be captured in the linked list.
        candidates = new_candidates

    answer = candidates[0]
    r_seq = [] # This winds up reversed.
    while answer.value is not None:
        r_seq.append(answer.value)
        answer = answer.tail

    return ''.join(reversed(r_seq))


print(minimal_nonsubseq('011', '1101'))