Python：查找唯一字符串的唯一子序列

Question

编辑：对于投反对票的人：我非常清楚我不需要代码，而且我自己已经尝试过了。我一直在寻找的是对产生示例结果的数学过程的解释。

第一个问题。我做了很多研究，最后求助于询问，所以如果我在某个地方错过了答案，我深表歉意。我遇到了一个让我苦苦挣扎的问题：

Write a Python 3 script that takes three command line arguments:

1. The name of a text file that contains n strings separated by white spaces.
2. A positive integer k.
3. The name of a text file that the script will create in order to store all possible subsequences of k unique strings out of the n strings from the input file, one subsequence per line.

For example, assume the command line is gen.py input.txt 3 output.txt and the file input.txt contains the following line:

Python Java C++ Java Java Python

Then the program should create the file output.txt containing the following lines (in any order):

Python Java C++
Python C++ Java
Java C++ Python
C++ Java Python

The combinations should be generated with your implementation of a generator function (i.e. using the keyword yield).

据我了解，根据示例输出，这并不完全符合子序列的定义；它们也不是完全排列，所以我不知道如何去做。我知道如何处理文件 IO 和命令行参数部分，只是无法获得正确的子序列。我不需要直接回答，因为我应该解决这个问题，但如果有人能给我一些有用的见解，我将不胜感激。

Answer 1

如果您被允许使用 itertools：

import itertools
import sys

def unique_substrings(txt_lst:list, k:int) -> set:
    return set([' '.join(combo) for combo in itertools.combinations(txt_lst, 3) \
                if len(set(combo))==3])

if __name__ == "__main__":
    infile, k, outfile = sys.argv[1:]
    with open(infile) as inf:
        txt_lst = infile.read().split()
    with open(outfile) as outf:
        for line in unique_substrings(txt_lst, k):
            outf.write(line + "\n")

但是根据您的导师的评论：

The combinations should be generated with your implementation of a generator function (i.e. using the keyword yield).

看起来这并不真的有效。

itertools.combinations 可以用近似于以下内容的东西重新实现 (from the docs):

def combinations(iterable, r):
    # combinations('ABCD', 2) --> AB AC AD BC BD CD
    # combinations(range(4), 3) --> 012 013 023 123
    pool = tuple(iterable)
    n = len(pool)
    if r > n:
        return
    indices = list(range(r))
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != i + n - r:
                break
        else:
            return
        indices[i] += 1
        for j in range(i+1, r):
            indices[j] = indices[j-1] + 1
        yield tuple(pool[i] for i in indices)

Python：查找唯一字符串的唯一子序列

Python: Finding Unique Subsequences of Unique Strings

python

string

python-3.x

subsequence