如何从 python 中的给定字符串生成所有后续的 1、2 和 3 个单词的组合？

Question

我在 python 中有一个字符串。我想得到所有一个词的子串，所有的 2 个词的子串和所有的 3 个词的子串。最有效的方法是什么？

我目前的解决方案是这样的：

>>> s = "This is the example string of which I want to generate subsequent combinations"
>>> words = s.split()
>>> lengths = [1, 2, 3]
>>> ans = []
>>> for ln in lengths:
...     for i in range(len(words)-ln+1):
...         ans.append(" ".join(words[i:i+ln]))
... 
>>> print(ans)
['This', 'is', 'the', 'example', 'string', 'of', 'which', 'I', 'want', 'to', 'generate', 'subsequent', 'combinations', 'This is', 'is the', 'the example', 'example string', 'string of', 'of which', 'which I', 'I want', 'want to', 'to generate', 'generate subsequent', 'subsequent combinations', 'This is the', 'is the example', 'the example string', 'example string of', 'string of which', 'of which I', 'which I want', 'I want to', 'want to generate', 'to generate subsequent', 'generate subsequent combinations']

Answer 1

你可以这样做：

from itertools import chain, combinations

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return list(map(lambda x: " ".join(x), chain.from_iterable(combinations(s, r) for r in range(1,4))))

s = "This is the example string of which I want to generate subsequent combinations"
print(powerset(s.split()))

要详细了解，请阅读：

Answer 2

FWIW，你可以做你所拥有的列表理解：

[' '.join(words[i:i+l]) for l in [1,2,3] for i in range(len(words)-l+1)]

是否更快？一点点:

%%timeit
ans = []
for ln in [1,2,3]:
    for i in range(len(words)-ln+1):
        ans.append(" ".join(words[i:i+ln]))
        
# 8.46 µs ± 89.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%%timeit
[' '.join(words[i:i+l]) for l in [1,2,3] for i in range(len(words)-l+1)]


# 7.03 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

是否更具可读性？可能不会。我可能会坚持使用你拥有的东西。

Answer 3

我认为最容易理解（无论如何对我而言）并且可能最快的方法是处理前两个词的特殊情况，然后遍历其余词，同时跟踪前面的词。

它的附带好处是速度最快（到目前为止）。

words = "This is the example string of which I want to generate subsequent combinations".split()
prior_prior_word = words[0]
prior_word = words[1]
ans = [prior_prior_word, prior_word, f"{prior_prior_word} {prior_word}"]
for word in words[2:]:
    ans.append(f"{word}")
    ans.append(f"{prior_word} {word}")
    ans.append(f"{prior_prior_word} {prior_word} {word}")
    prior_prior_word = prior_word
    prior_word = word
print(ans)

如果你想timeit，你可以试试：

import timeit

ruchit = '''
words = "This is the example string of which I want to generate subsequent combinations".split()
def test(words):
    lengths = [1, 2, 3]
    ans = []
    for ln in lengths:
        for i in range(len(words)-ln+1):
            ans.append(" ".join(words[i:i+ln]))
    return ans
'''

tom = '''
words = "This is the example string of which I want to generate subsequent combinations".split()
def test(words):
    return [' '.join(words[i:i+l]) for l in [1,2,3] for i in range(len(words)-l+1)]
'''
        
jonsg = '''
words = "This is the example string of which I want to generate subsequent combinations".split()
def test(words):
    prior_prior_word = words[0]
    prior_word = words[1]
    ans = [prior_prior_word, prior_word, f"{prior_prior_word} {prior_word}"]
    for word in words[2:]:
        ans.append(f"{word}")
        ans.append(f"{prior_word} {word}")
        ans.append(f"{prior_prior_word} {prior_word} {word}")
        prior_prior_word = prior_word
        prior_word = word
    return ans
'''

runs = 1_000_000
print("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
print(f"Test: ruchit Time: {timeit.timeit('test(words)', setup=ruchit, number=runs)}")
print("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
print(f"Test: tom Time: {timeit.timeit('test(words)', setup=tom, number=runs)}")
print("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
print(f"Test: jonsg Time: {timeit.timeit('test(words)', setup=jonsg, number=runs)}")
print("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")

这给了我：

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Test: ruchit Time: 8.692457999999998
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Test: tom Time: 7.512314900000002
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Test: jonsg Time: 3.7232652
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

您的里程可能会有所不同。

如何从 python 中的给定字符串生成所有后续的 1、2 和 3 个单词的组合？

How to generate all subsequent combinations of 1, 2 and 3 words from a given string in python?

python

string

combinations

python-3.x