python 字符串子集的所有组合
python all combinations of subsets of a string
我需要字符串子集的所有组合。此外,长度为 1 的子集后面只能跟着长度 > 1 的子集。例如对于字符串 4824
,结果应该是:
[ [4, 824], [4, 82, 4], [48, 24], [482, 4], [4824] ]
到目前为止,我设法检索了所有可能的子集:
length = len(number)
ss = []
for i in xrange(length):
for j in xrange(i,length):
ss.append(number[i:j + 1])
这给了我:
['4', '48', '482', '4824', '8', '82', '824', '2', '24', '4']
但我现在不知道如何将它们结合起来。
首先,编写一个生成所有字符串分区的函数:
def partitions(s):
if s:
for i in range(1, len(s) + 1):
for p in partitions(s[i:]):
yield [s[:i]] + p
else:
yield []
这将迭代所有可能的第一段(一个字符、两个字符等),并将它们与字符串的相应剩余部分的所有分区组合。
>>> list(partitions("4824"))
[['4', '8', '2', '4'], ['4', '8', '24'], ['4', '82', '4'], ['4', '824'], ['48', '2', '4'], ['48', '24'], ['482', '4'], ['4824']]
现在,您可以只过滤符合您条件的那些,即没有两个长度为 1 的连续子串的那些。
>>> [p for p in partitions("4824") if not any(len(x) == len(y) == 1 for x, y in zip(p, p[1:]))]
[['4', '82', '4'], ['4', '824'], ['48', '24'], ['482', '4'], ['4824']]
此处,zip(p, p[1:])
是迭代所有连续项目对的常用方法。
更新:实际上,将您的约束直接合并到 partition
函数中也不难。只需跟踪最后一段并相应地设置最小长度。
def partitions(s, minLength=1):
if len(s) >= minLength:
for i in range(minLength, len(s) + 1):
for p in partitions(s[i:], 1 if i > 1 else 2):
yield [s[:i]] + p
elif not s:
yield []
演示:
>>> print list(partitions("4824"))
[['4', '82', '4'], ['4', '824'], ['48', '24'], ['482', '4'], ['4824']]
看到更多测试用例会很有趣,下面的算法按照你说的做:
s="4824"
def partitions(s):
yield [s]
if(len(s)>2):
for i in range(len(s)-1, 0, -1):
for g in partitions(s[i:]):
out = [s[:i]] + g
if not any([len(out[i]) == len(out[i+1]) and len(out[i])==1 for i in range(len(out)-1)]):
yield out
list(partitions(s))
你得到:
[['4824'], ['482', '4'], ['48', '24'], ['4', '824'], ['4', '82', '4']]
说明
我基于以下算法:
s="4824"
def partitions_original(s):
#yield original string
yield [s]
if(len(s)>2):
for i in range(len(s)-1, 0, -1):
#divide string in two parts
#iteration 1: a="482", b="4"
#iteration 2: a="48", b="24"
#iteration 3: a="4", b="824"
a = s[:i]
b = s[i:]
#recursive call of b
for g in partitions_original(b):
#iteration 1: b="4", g=[['4']]
#iteration 2: b="24", g=[['24']]
#iteration 3: b="824", g=[['824'], ['82', '4'], ['8', '24']]
yield [a] + g
list(partitions_original(s))
你得到:
[['4824'], ['482', '4'], ['48', '24'], ['4', '824'],
['4', '82', '4'], ['4', '8', '24']]
问题是['4', '8', '24']
.....那么我必须在代码中添加if
,因为"a subset of length 1 can only be followed by a subset with length > 1"
[len(out[i]) == len(out[i+1]) and len(out[i])==1 for i in range(len(out)-1)]
return for ['4', '8', '24']
-> [True, False]
.... any
Return 如果 iterable 的任何元素是真
注意
也可以用:
if all([len(out[i]) != len(out[i+1]) or len(out[i])!=1 for i in range(len(out)-1)]):
我这里做的是获取字符串所有可能的分割位置,去掉最后一个。
例如,在一些包含 5 个数字“12345”的字符串中,有 4 个可能的位置来拆分字符串,将其命名为 possibility = (0,0,0,0),(1,0,1,0)
... 用 (0,0,1,0) mean (don't separate 1 and 2345,don't separate 12 and 345,separate 123 and 45,don't separate 1234 and 5)
这样你就可以由于我们消除了 (1,1,1,1) 情况,因此在验证您的条件时获得所有可能性。
import itertools
from math import factorial
from itertools import product
def get_comb(string):
L = len(string_)
combinisation = []
for possibility in product([0,1], repeat=len(string_)-1):
s = []
indexes = [i for i in range(len(string_)-1) if list(possibility)[i]!=0]
if sum(indexes) != 0:
if sum(indexes) != len(string_)-1:
for index in indexes:
s.append(string_[:index+1])
s.append(string_[indexes[-1:][0]+1:])
combinisation.append(s)
else:
combinisation.append(string_)
return combinisation
string_ = '4824'
print "%s combinations:"%string_
print get_comb(string_)
string_ = '478952'
print "%s combinations:"%string_
print get_comb(string_)
string_ = '1234'
print "%s combinations:"%string_
print get_comb(string_)
>>
4824 combinations:
[['482', '4'], ['48', '24'], '4824', ['4', '482', '4'], ['4', '48', '24'], '4824
']
478952 combinations:
[['47895', '2'], ['4789', '52'], ['4789', '47895', '2'], ['478', '952'], ['478',
'47895', '2'], '478952', ['478', '4789', '47895', '2'], ['47', '8952'], '478952
', ['47', '4789', '52'], ['47', '4789', '47895', '2'], ['47', '478', '952'], ['4
7', '478', '47895', '2'], ['47', '478', '4789', '52'], ['47', '478', '4789', '47
895', '2'], ['4', '47895', '2'], ['4', '4789', '52'], ['4', '4789', '47895', '2'
], ['4', '478', '952'], ['4', '478', '47895', '2'], '478952', ['4', '478', '4789
', '47895', '2'], ['4', '47', '8952'], '478952', ['4', '47', '4789', '52'], ['4'
, '47', '4789', '47895', '2'], ['4', '47', '478', '952'], ['4', '47', '478', '47
895', '2'], ['4', '47', '478', '4789', '52'], ['4', '47', '478', '4789', '47895'
, '2']]
1234 combinations:
[['123', '4'], ['12', '34'], '1234', ['1', '123', '4'], ['1', '12', '34'], '1234
']
一个普通的代码可以这样写:
s=raw_input('enter the string:')
word=[]
for i in range(len(s)):
for j in range(i,len(s)):
word.append(s[i:j+1])
print word
print 'no of possible combinations:',len(word)
并输出:
输入字符串:
4824
['4', '48', '482', '4824', '8', '82', '824', '2', '24', '4']
没有可能 combinations:10
我需要字符串子集的所有组合。此外,长度为 1 的子集后面只能跟着长度 > 1 的子集。例如对于字符串 4824
,结果应该是:
[ [4, 824], [4, 82, 4], [48, 24], [482, 4], [4824] ]
到目前为止,我设法检索了所有可能的子集:
length = len(number)
ss = []
for i in xrange(length):
for j in xrange(i,length):
ss.append(number[i:j + 1])
这给了我:
['4', '48', '482', '4824', '8', '82', '824', '2', '24', '4']
但我现在不知道如何将它们结合起来。
首先,编写一个生成所有字符串分区的函数:
def partitions(s):
if s:
for i in range(1, len(s) + 1):
for p in partitions(s[i:]):
yield [s[:i]] + p
else:
yield []
这将迭代所有可能的第一段(一个字符、两个字符等),并将它们与字符串的相应剩余部分的所有分区组合。
>>> list(partitions("4824"))
[['4', '8', '2', '4'], ['4', '8', '24'], ['4', '82', '4'], ['4', '824'], ['48', '2', '4'], ['48', '24'], ['482', '4'], ['4824']]
现在,您可以只过滤符合您条件的那些,即没有两个长度为 1 的连续子串的那些。
>>> [p for p in partitions("4824") if not any(len(x) == len(y) == 1 for x, y in zip(p, p[1:]))]
[['4', '82', '4'], ['4', '824'], ['48', '24'], ['482', '4'], ['4824']]
此处,zip(p, p[1:])
是迭代所有连续项目对的常用方法。
更新:实际上,将您的约束直接合并到 partition
函数中也不难。只需跟踪最后一段并相应地设置最小长度。
def partitions(s, minLength=1):
if len(s) >= minLength:
for i in range(minLength, len(s) + 1):
for p in partitions(s[i:], 1 if i > 1 else 2):
yield [s[:i]] + p
elif not s:
yield []
演示:
>>> print list(partitions("4824"))
[['4', '82', '4'], ['4', '824'], ['48', '24'], ['482', '4'], ['4824']]
看到更多测试用例会很有趣,下面的算法按照你说的做:
s="4824"
def partitions(s):
yield [s]
if(len(s)>2):
for i in range(len(s)-1, 0, -1):
for g in partitions(s[i:]):
out = [s[:i]] + g
if not any([len(out[i]) == len(out[i+1]) and len(out[i])==1 for i in range(len(out)-1)]):
yield out
list(partitions(s))
你得到:
[['4824'], ['482', '4'], ['48', '24'], ['4', '824'], ['4', '82', '4']]
说明
我基于以下算法:
s="4824"
def partitions_original(s):
#yield original string
yield [s]
if(len(s)>2):
for i in range(len(s)-1, 0, -1):
#divide string in two parts
#iteration 1: a="482", b="4"
#iteration 2: a="48", b="24"
#iteration 3: a="4", b="824"
a = s[:i]
b = s[i:]
#recursive call of b
for g in partitions_original(b):
#iteration 1: b="4", g=[['4']]
#iteration 2: b="24", g=[['24']]
#iteration 3: b="824", g=[['824'], ['82', '4'], ['8', '24']]
yield [a] + g
list(partitions_original(s))
你得到:
[['4824'], ['482', '4'], ['48', '24'], ['4', '824'],
['4', '82', '4'], ['4', '8', '24']]
问题是['4', '8', '24']
.....那么我必须在代码中添加if
,因为"a subset of length 1 can only be followed by a subset with length > 1"
[len(out[i]) == len(out[i+1]) and len(out[i])==1 for i in range(len(out)-1)]
return for ['4', '8', '24']
-> [True, False]
.... any
Return 如果 iterable 的任何元素是真
注意
也可以用:
if all([len(out[i]) != len(out[i+1]) or len(out[i])!=1 for i in range(len(out)-1)]):
我这里做的是获取字符串所有可能的分割位置,去掉最后一个。
例如,在一些包含 5 个数字“12345”的字符串中,有 4 个可能的位置来拆分字符串,将其命名为 possibility = (0,0,0,0),(1,0,1,0)
... 用 (0,0,1,0) mean (don't separate 1 and 2345,don't separate 12 and 345,separate 123 and 45,don't separate 1234 and 5)
这样你就可以由于我们消除了 (1,1,1,1) 情况,因此在验证您的条件时获得所有可能性。
import itertools
from math import factorial
from itertools import product
def get_comb(string):
L = len(string_)
combinisation = []
for possibility in product([0,1], repeat=len(string_)-1):
s = []
indexes = [i for i in range(len(string_)-1) if list(possibility)[i]!=0]
if sum(indexes) != 0:
if sum(indexes) != len(string_)-1:
for index in indexes:
s.append(string_[:index+1])
s.append(string_[indexes[-1:][0]+1:])
combinisation.append(s)
else:
combinisation.append(string_)
return combinisation
string_ = '4824'
print "%s combinations:"%string_
print get_comb(string_)
string_ = '478952'
print "%s combinations:"%string_
print get_comb(string_)
string_ = '1234'
print "%s combinations:"%string_
print get_comb(string_)
>>
4824 combinations:
[['482', '4'], ['48', '24'], '4824', ['4', '482', '4'], ['4', '48', '24'], '4824
']
478952 combinations:
[['47895', '2'], ['4789', '52'], ['4789', '47895', '2'], ['478', '952'], ['478',
'47895', '2'], '478952', ['478', '4789', '47895', '2'], ['47', '8952'], '478952
', ['47', '4789', '52'], ['47', '4789', '47895', '2'], ['47', '478', '952'], ['4
7', '478', '47895', '2'], ['47', '478', '4789', '52'], ['47', '478', '4789', '47
895', '2'], ['4', '47895', '2'], ['4', '4789', '52'], ['4', '4789', '47895', '2'
], ['4', '478', '952'], ['4', '478', '47895', '2'], '478952', ['4', '478', '4789
', '47895', '2'], ['4', '47', '8952'], '478952', ['4', '47', '4789', '52'], ['4'
, '47', '4789', '47895', '2'], ['4', '47', '478', '952'], ['4', '47', '478', '47
895', '2'], ['4', '47', '478', '4789', '52'], ['4', '47', '478', '4789', '47895'
, '2']]
1234 combinations:
[['123', '4'], ['12', '34'], '1234', ['1', '123', '4'], ['1', '12', '34'], '1234
']
一个普通的代码可以这样写:
s=raw_input('enter the string:')
word=[]
for i in range(len(s)):
for j in range(i,len(s)):
word.append(s[i:j+1])
print word
print 'no of possible combinations:',len(word)
并输出: 输入字符串: 4824 ['4', '48', '482', '4824', '8', '82', '824', '2', '24', '4'] 没有可能 combinations:10