交错两个字符串的所有可能方法
All possible ways to interleave two strings
我正在尝试生成所有可能的方法来交错 Python 中的任意两个字符串。
例如:如果两个字符串是'ab'
和'cd'
,我希望得到的输出是:
['abcd', 'acbd', 'acdb', 'cabd', 'cadb', 'cdab']
看到 a
总是在 b
之前(而 c
在 d
之前)。我正在努力寻找解决方案。我试过如下所示的 itertools:
import itertools
def shuffle(s,t):
string = s+t
for i in itertools.permutations(string):
print(''.join(i))
shuffle('ab','cd')
但正如预期的那样,这个 returns 所有可能的排列都忽略了 a
和 b
(以及 c
和 d
)的顺序。
想法
让你想要交错的两个字符串为s
和t
。我们将使用递归来生成所有可能的方式来交错这两个字符串。
如果在任何时候我们交错 s
的前 i
个字符和 t
的前 j
个字符来创建一些字符串 res
,那么下一步我们有两种方法将它们交错-
- 将
s
的第 i+1
个字符附加到 res
- 将
t
的第 j+1
个字符附加到 res
我们继续这个递归,直到两个字符串的所有字符都被使用,然后我们将这个结果存储在一个字符串列表中 lis
,如下面的代码所示。
代码
def interleave(s, t, res, i, j, lis):
if i == len(s) and j == len(t):
lis.append(res)
return
if i < len(s):
interleave(s, t, res + s[i], i + 1, j, lis)
if j < len(t):
interleave(s, t, res + t[j], i, j + 1, lis)
l = []
s = "ab"
t = "cd"
interleave(s, t, "", 0, 0, l)
print l
输出
['abcd', 'acbd', 'acdb', 'cabd', 'cadb', 'cdab']
这个实现是我们所能达到的最高效的(至少是渐进的),因为我们从来没有生成相同的字符串两次。
效率极低但有效:
def shuffle(s,t):
if s=="":
return [t]
elif t=="":
return [s]
else:
leftShuffle=[s[0]+val for val in shuffle(s[1:],t)]
rightShuffle=[t[0]+val for val in shuffle(s,t[1:])]
leftShuffle.extend(rightShuffle)
return leftShuffle
print(shuffle("ab","cd"))
只需要比较a
和b
,c
和d
的索引,然后过滤掉a
索引的元素大于 b
的索引并且 c
的索引大于 d
.
的索引
def interleave(s, t):
mystring = s + t
return [el for el in [''.join(item) for item in permutations(mystring) if item.index('a') < item.index('b') and item.index('c') < item.index('d')]]
演示:
>>> from itertools import permutations
>>> s = 'ab'
>>> t = 'cd'
>>> [el for el in [''.join(item) for item in permutations(s+t) if item.index('a') < item.index('b') and item.index('c') < item.index('d')]]
['abcd', 'acbd', 'acdb', 'cabd', 'cadb', 'cdab']
只为运动
没有显式条件或谓词的解决方案
(即没有任何 if
关键字):
from itertools import chain, repeat, permutations
from copy import deepcopy
def shuffle(*strings):
# Treat the strings as pools from which to draw elements in order.
# Convert the strings to lists, so that drawn items can be removed:
pools = (list(string) for string in strings)
# From each pool, we have to draw as many times as it has items:
pools_to_draw_from = chain.from_iterable(
repeat(pool, len(pool)) for pool in pools
)
# Because itertools.permutations treats elements as unique based on their
# position, not on their value and because pools_to_draw_from has repeated
# repeated items, we would get repeated permutations, if we would not
# filter them out with `unique`.
possible_drawing_orders = unique(permutations(pools_to_draw_from))
# For each drawing order, we want to draw (and thus remove) items from our
# pools. Subsequent draws within the same drawing order should get the
# respective next item in the pool, i.e., see the modified pool. But we don't
# want the pools to be exhausted after processing the first drawing ordering.
#
# Deepcopy preserves internal repetition and thus does exactly what we need.
possible_drawing_orders = (deepcopy(pdo) for pdo in possible_drawing_orders)
# Draw from the pools for each possible order,
# build strings and return them in a list:
return [''.join(_draw(p)) for p in possible_drawing_orders]
def _draw(drawing_order):
return (pool_to_draw_from.pop(0) for pool_to_draw_from in drawing_order)
为此我们需要一个辅助函数:
from operator import itemgetter
from itertools import groupby
def unique(iterable, key=None):
# Other than unique_everseen from
# https://docs.python.org/3/library/itertools.html#itertools-recipes, this
# works for iterables of non-hashable elements, too.
return unique_justseen(sorted(iterable, key=key), key)
def unique_justseen(iterable, key=None):
"""
List unique elements, preserving order. Remember only the element just seen.
"""
# from https://docs.python.org/3/library/itertools.html#itertools-recipes
return map(next, map(itemgetter(1), groupby(iterable, key)))
如果非唯一排列的数量很大,由于调用 sorted
,这可能效率很低。有关获取非唯一值的唯一排列的替代方法,请参阅 permutations with unique values.
TL;DR?
没问题。我们可以将这种方法归结为这种可憎的行为:
from itertools import chain, repeat, permutations
from copy import deepcopy
def shuffle(*strings):
return list({''.join(l.pop(0) for l in deepcopy(p)) for p in permutations(chain.from_iterable(repeat(list(s), len(s)) for s in strings))})
(对结果使用集合理解而不是更早地确保唯一性。)
已经发布了其他几个解决方案,但其中大多数会在内存中生成完整的交错字符串列表(或与其等效的内容),从而使它们的内存使用量随输入长度呈指数增长。一定有更好的方法。
枚举两个序列的所有交错方式,长度分别为a和b,与枚举所有[=27基本相同=]a+b 位整数,设置了 b 位。每个这样的整数对应于一种不同的序列交错方式,通过用第一个序列的元素替换每个 0 位,用第二个序列的元素替换每个 1 位来获得。
方便的是,有一种聪明有效的方法 calculate the next integer with the same number of bits set,我们可以使用它来生成所有此类整数。那么让我们先这样做:
def bit_patterns(m, n):
"""Generate all m-bit numbers with exactly n bits set, in ascending order.
See http://www.geeksforgeeks.org/next-higher-number-with-same-number-of-set-bits/
"""
patt = (1 << int(n)) - 1
if patt == 0: yield 0; return # loop below assumes patt has at least one bit set!
while (patt >> m) == 0:
yield patt
lowb = patt & -patt # extract the lowest bit of the pattern
incr = patt + lowb # increment the lowest bit
diff = patt ^ incr # extract the bits flipped by the increment
patt = incr + ((diff // lowb) >> 2) # restore bit count after increment
现在我们可以使用这个生成器生成交错任意两个序列的所有方式:
def interleave(a, b):
"""Generate all possible ways to interleave two sequences."""
m = len(a) + len(b)
n = len(a)
for pattern in bit_patterns(m, n):
seq = []
i = j = 0
for k in range(m):
bit = pattern & 1
pattern >>= 1
seq.append(a[i] if bit else b[j])
i += bit
j += 1-bit
yield seq
请注意,为了尽可能通用,此代码采用任意序列类型和 returns 列表。字符串是 Python 中的序列,所以你可以将它们传入就好了;要将生成的列表转换回字符串,您可以连接它们的元素,例如"".join()
,像这样:
foo = "ABCD"
bar = "1234"
for seq in interleave(foo, bar):
print("".join(seq))
我们开始了:一个完全非递归的高效的基于生成器的解决方案,即使对于长输入也只使用很少的内存,并且每个输出只生成一次(因此不需要低效的重复消除步骤)。它甚至适用于 Python 2 和 3。
我正在尝试生成所有可能的方法来交错 Python 中的任意两个字符串。
例如:如果两个字符串是'ab'
和'cd'
,我希望得到的输出是:
['abcd', 'acbd', 'acdb', 'cabd', 'cadb', 'cdab']
看到 a
总是在 b
之前(而 c
在 d
之前)。我正在努力寻找解决方案。我试过如下所示的 itertools:
import itertools
def shuffle(s,t):
string = s+t
for i in itertools.permutations(string):
print(''.join(i))
shuffle('ab','cd')
但正如预期的那样,这个 returns 所有可能的排列都忽略了 a
和 b
(以及 c
和 d
)的顺序。
想法
让你想要交错的两个字符串为s
和t
。我们将使用递归来生成所有可能的方式来交错这两个字符串。
如果在任何时候我们交错 s
的前 i
个字符和 t
的前 j
个字符来创建一些字符串 res
,那么下一步我们有两种方法将它们交错-
- 将
s
的第i+1
个字符附加到res
- 将
t
的第j+1
个字符附加到res
我们继续这个递归,直到两个字符串的所有字符都被使用,然后我们将这个结果存储在一个字符串列表中 lis
,如下面的代码所示。
代码
def interleave(s, t, res, i, j, lis):
if i == len(s) and j == len(t):
lis.append(res)
return
if i < len(s):
interleave(s, t, res + s[i], i + 1, j, lis)
if j < len(t):
interleave(s, t, res + t[j], i, j + 1, lis)
l = []
s = "ab"
t = "cd"
interleave(s, t, "", 0, 0, l)
print l
输出
['abcd', 'acbd', 'acdb', 'cabd', 'cadb', 'cdab']
这个实现是我们所能达到的最高效的(至少是渐进的),因为我们从来没有生成相同的字符串两次。
效率极低但有效:
def shuffle(s,t):
if s=="":
return [t]
elif t=="":
return [s]
else:
leftShuffle=[s[0]+val for val in shuffle(s[1:],t)]
rightShuffle=[t[0]+val for val in shuffle(s,t[1:])]
leftShuffle.extend(rightShuffle)
return leftShuffle
print(shuffle("ab","cd"))
只需要比较a
和b
,c
和d
的索引,然后过滤掉a
索引的元素大于 b
的索引并且 c
的索引大于 d
.
def interleave(s, t):
mystring = s + t
return [el for el in [''.join(item) for item in permutations(mystring) if item.index('a') < item.index('b') and item.index('c') < item.index('d')]]
演示:
>>> from itertools import permutations
>>> s = 'ab'
>>> t = 'cd'
>>> [el for el in [''.join(item) for item in permutations(s+t) if item.index('a') < item.index('b') and item.index('c') < item.index('d')]]
['abcd', 'acbd', 'acdb', 'cabd', 'cadb', 'cdab']
只为运动
没有显式条件或谓词的解决方案
(即没有任何 if
关键字):
from itertools import chain, repeat, permutations
from copy import deepcopy
def shuffle(*strings):
# Treat the strings as pools from which to draw elements in order.
# Convert the strings to lists, so that drawn items can be removed:
pools = (list(string) for string in strings)
# From each pool, we have to draw as many times as it has items:
pools_to_draw_from = chain.from_iterable(
repeat(pool, len(pool)) for pool in pools
)
# Because itertools.permutations treats elements as unique based on their
# position, not on their value and because pools_to_draw_from has repeated
# repeated items, we would get repeated permutations, if we would not
# filter them out with `unique`.
possible_drawing_orders = unique(permutations(pools_to_draw_from))
# For each drawing order, we want to draw (and thus remove) items from our
# pools. Subsequent draws within the same drawing order should get the
# respective next item in the pool, i.e., see the modified pool. But we don't
# want the pools to be exhausted after processing the first drawing ordering.
#
# Deepcopy preserves internal repetition and thus does exactly what we need.
possible_drawing_orders = (deepcopy(pdo) for pdo in possible_drawing_orders)
# Draw from the pools for each possible order,
# build strings and return them in a list:
return [''.join(_draw(p)) for p in possible_drawing_orders]
def _draw(drawing_order):
return (pool_to_draw_from.pop(0) for pool_to_draw_from in drawing_order)
为此我们需要一个辅助函数:
from operator import itemgetter
from itertools import groupby
def unique(iterable, key=None):
# Other than unique_everseen from
# https://docs.python.org/3/library/itertools.html#itertools-recipes, this
# works for iterables of non-hashable elements, too.
return unique_justseen(sorted(iterable, key=key), key)
def unique_justseen(iterable, key=None):
"""
List unique elements, preserving order. Remember only the element just seen.
"""
# from https://docs.python.org/3/library/itertools.html#itertools-recipes
return map(next, map(itemgetter(1), groupby(iterable, key)))
如果非唯一排列的数量很大,由于调用 sorted
,这可能效率很低。有关获取非唯一值的唯一排列的替代方法,请参阅 permutations with unique values.
TL;DR?
没问题。我们可以将这种方法归结为这种可憎的行为:
from itertools import chain, repeat, permutations
from copy import deepcopy
def shuffle(*strings):
return list({''.join(l.pop(0) for l in deepcopy(p)) for p in permutations(chain.from_iterable(repeat(list(s), len(s)) for s in strings))})
(对结果使用集合理解而不是更早地确保唯一性。)
已经发布了其他几个解决方案,但其中大多数会在内存中生成完整的交错字符串列表(或与其等效的内容),从而使它们的内存使用量随输入长度呈指数增长。一定有更好的方法。
枚举两个序列的所有交错方式,长度分别为a和b,与枚举所有[=27基本相同=]a+b 位整数,设置了 b 位。每个这样的整数对应于一种不同的序列交错方式,通过用第一个序列的元素替换每个 0 位,用第二个序列的元素替换每个 1 位来获得。
方便的是,有一种聪明有效的方法 calculate the next integer with the same number of bits set,我们可以使用它来生成所有此类整数。那么让我们先这样做:
def bit_patterns(m, n):
"""Generate all m-bit numbers with exactly n bits set, in ascending order.
See http://www.geeksforgeeks.org/next-higher-number-with-same-number-of-set-bits/
"""
patt = (1 << int(n)) - 1
if patt == 0: yield 0; return # loop below assumes patt has at least one bit set!
while (patt >> m) == 0:
yield patt
lowb = patt & -patt # extract the lowest bit of the pattern
incr = patt + lowb # increment the lowest bit
diff = patt ^ incr # extract the bits flipped by the increment
patt = incr + ((diff // lowb) >> 2) # restore bit count after increment
现在我们可以使用这个生成器生成交错任意两个序列的所有方式:
def interleave(a, b):
"""Generate all possible ways to interleave two sequences."""
m = len(a) + len(b)
n = len(a)
for pattern in bit_patterns(m, n):
seq = []
i = j = 0
for k in range(m):
bit = pattern & 1
pattern >>= 1
seq.append(a[i] if bit else b[j])
i += bit
j += 1-bit
yield seq
请注意,为了尽可能通用,此代码采用任意序列类型和 returns 列表。字符串是 Python 中的序列,所以你可以将它们传入就好了;要将生成的列表转换回字符串,您可以连接它们的元素,例如"".join()
,像这样:
foo = "ABCD"
bar = "1234"
for seq in interleave(foo, bar):
print("".join(seq))
我们开始了:一个完全非递归的高效的基于生成器的解决方案,即使对于长输入也只使用很少的内存,并且每个输出只生成一次(因此不需要低效的重复消除步骤)。它甚至适用于 Python 2 和 3。