Python - 按顺序拆分字符串

Python - get all in order splits of a string

也就是说,将一个句子分解成所有可能的有序单词组合,不遗漏任何单词

例如,对于输入 "The cat sat on the mat"

输出

[("The", "cat sat on the mat"),
("The cat", "sat on the mat"),  
("The cat", "sat", "on the mat")] #etc

但不是

("The mat", "cat sat on the") # out of order
("The cat"), ("mat") # words missing

我查看了 itertools 中的方法,但看不到它们的作用,因为组合会遗漏项目("the cat"、"mat")并且排列会改变顺序。

我是否遗漏了这些工具中的某些内容,或者它们只是不正确?

(为了清楚起见,这不是关于如何拆分字符串的问题,而是如何获得组合的问题)

修改来自 WordAligned 的 Raymond Hettinger's partition recipe for Python 3 as inspired by this blog post,以及您列表中的每个分区案例,我们可以使用来自 itertools 的 chaincombinations 来完成此操作。

from itertools import chain, combinations
def partition(iterable):
    n = len(input_list)
    b, mid, e = [0], list(range(1, n)), [n]
    getslice = input_list.__getitem__
    splits = (d for i in range(n) for d in combinations(mid, i))
    return [[input_list[sl] for sl in map(slice, chain(b, d), chain(d, e))]
            for d in splits]

演示:

>>> print(partition(input_list))
[[['The', 'cat', 'sat', 'on', 'the', 'mat']], [['The'], ['cat', 'sat', 'on', 'the', 'mat']], [['The', 'cat'], ['sat', 'on', 'the', 'mat']], [['The', 'cat', 'sat'], ['on', 'the', 'mat']], [['The', 'cat', 'sat', 'on'], ['the', 'mat']], [['The', 'cat', 'sat', 'on', 'the'], ['mat']], [['The'], ['cat'], ['sat', 'on', 'the', 'mat']], [['The'], ['cat', 'sat'], ['on', 'the', 'mat']], [['The'], ['cat', 'sat', 'on'], ['the', 'mat']], [['The'], ['cat', 'sat', 'on', 'the'], ['mat']], [['The', 'cat'], ['sat'], ['on', 'the', 'mat']], [['The', 'cat'], ['sat', 'on'], ['the', 'mat']], [['The', 'cat'], ['sat', 'on', 'the'], ['mat']], [['The', 'cat', 'sat'], ['on'], ['the', 'mat']], [['The', 'cat', 'sat'], ['on', 'the'], ['mat']], [['The', 'cat', 'sat', 'on'], ['the'], ['mat']], [['The'], ['cat'], ['sat'], ['on', 'the', 'mat']], [['The'], ['cat'], ['sat', 'on'], ['the', 'mat']], [['The'], ['cat'], ['sat', 'on', 'the'], ['mat']], [['The'], ['cat', 'sat'], ['on'], ['the', 'mat']], [['The'], ['cat', 'sat'], ['on', 'the'], ['mat']], [['The'], ['cat', 'sat', 'on'], ['the'], ['mat']], [['The', 'cat'], ['sat'], ['on'], ['the', 'mat']], [['The', 'cat'], ['sat'], ['on', 'the'], ['mat']], [['The', 'cat'], ['sat', 'on'], ['the'], ['mat']], [['The', 'cat', 'sat'], ['on'], ['the'], ['mat']], [['The'], ['cat'], ['sat'], ['on'], ['the', 'mat']], [['The'], ['cat'], ['sat'], ['on', 'the'], ['mat']], [['The'], ['cat'], ['sat', 'on'], ['the'], ['mat']], [['The'], ['cat', 'sat'], ['on'], ['the'], ['mat']], [['The', 'cat'], ['sat'], ['on'], ['the'], ['mat']], [['The'], ['cat'], ['sat'], ['on'], ['the'], ['mat']]]