字符串上的单次交换
Single swap on a string
我需要找到一种更快的方法来查找 8-11 个字符串中的交换,方法如下:
给定一个字符串 'STDILGNLYE'
,找到字母的所有单个字母交换:
list_AA = ['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M',
'F', 'P', 'S', 'T', 'W', 'Y', 'V']
即,对于字符串中的每个字母,将原始字符串中的每个字母替换为 list_aa
中的一个字母。输出将是:
ATDILGNLYE
RTDILGNLYE
NTDILGNLYE
...
SADILGNLYE
SRDILGNLYE
SNDILGNLYE
...
...
STDILGNLYV
总共 200 个新字符串(字符串中每个位置 20 个交换)。
我目前拥有的:
def _create_swaps(original_str):
list_peps = []
for i in range(len(original_str)):
for k in range(len(list_AA)):
list_peps.append(_insert_aa(original_str, i, list_aa[k]))
#remove original string
return [i for i in list_peps if i != original_str]
def _insert_aa(string, index, aa):
list_string_elements = list(string)
del list_string_elements[index]
hash_string.insert(index, aa)
return "".join(hash_string)
由于这需要重复 ~10**6 次,这是大型项目中最慢的一步。有没有一种方法可以更快地找到此类交换(通过消除 "".join
、插入、步骤/通过动态查找交换)?
供参考:
ncalls tottime percall cumtime percall filename:lineno(function)
185275200 330.286 0.000 429.295 0.000 models.py:233(_insert_aa)
975240 147.322 0.000 616.979 0.001 models.py:225(_create_swaps)
185280201/185280197 59.137 0.000 59.138 0.000 {method 'join' of 'str' objects}
185275208 39.875 0.000 39.875 0.000 {method 'insert' of 'list' objects}
975240 21.027 0.000 21.027 0.000 models.py:231(<listcomp>)
186746064 18.516 0.000 18.516 0.000 {method 'append' of 'list' objects}
这应该会更快:
def _insert_aa(string, index, aa):
return string[0:index] + aa + string[index+1:]
编辑:您只能将头部和尾部切片一次并像这样重复使用:
def generate_all_variants(string, replacements):
for i in range(len(string)):
head = string[:i]
tail = string[i+1:]
for letter in replacements:
yield head + letter + tail
for variant in generate_all_variants("abcd", ['1', '2', '3']):
print(variant)
即使您已经选择了一个答案(它不是最 pythonic),但这是您正在寻找的更清晰的版本。
你不应该使用 range 来获取可迭代对象的索引,如果你想成为 pythonic,你应该使用 enumerate。
>>> def swaps(s, lst):
... for index, _ in enumerate(s):
... for letter in lst:
... temp = list(s)
... temp[index] = letter
... yield ''.join(temp)
...
>>> list_AA = ['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V']
>>> s = 'STDILGNLYE'
>>>
>>> for _ in swaps(s, list_AA):
... print _
...
ATDILGNLYE
RTDILGNLYE
NTDILGNLYE
..........
GTDILGNLYE
HTDILGNLYE
ITDILGNLYE
此外,python3中的简单方法:
>>> def swaps(s, lst):
... for i, _ in enumerate(s):
... yield from ['%s%s%s' % (s[:i], x, s[i+1:]) for x in lst]
...
>>> swaps(s,list_AA)
<generator object swaps at 0x10c9205c8>
>>> a=_
>>> next(a)
'ATDILGNLYE'
>>> next(a)
'RTDILGNLYE'
>>> next(a)
'NTDILGNLYE'
>>> next(a)
'DTDILGNLYE'
编辑:speed/readability
上的妥协解决方案
def swap3(s, lst):
for i, _ in enumerate(s):
head, tail = s[:i], s[i+1:]
yield from ['%s%s%s'%(head,c,tail) for c in lst]
下面是所有三个的基准测试:
s='STDILGNLYE'
list_AA=['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F',
'P', 'S', 'T', 'W', 'Y', 'V']
# the correct sample size
list_new = list_AA * (10**6 // len(list_AA))
def swaps0(string, replacements):
for i in range(len(string)):
head = string[:i]
tail = string[i+1:]
for letter in replacements:
yield head + letter + tail
def swaps1(s, lst):
for i, _ in enumerate(s):
yield from ['%s%s%s' % (s[:i], x, s[i+1:]) for x in lst]
def swaps2(s, lst):
for index, _ in enumerate(s):
for letter in lst:
temp = list(s)
temp[index] = letter
yield ''.join(temp)
timeit [_ for _ in swaps0(s, list_new)]
timeit [_ for _ in swaps1(s, list_new)]
timeit [_ for _ in swaps2(s, list_new)]
In [9]: timeit [_ for _ in swaps0(s, list_new)]
1 loop, best of 3: 2.61 s per loop
In [10]: timeit [_ for _ in swaps1(s, list_new)]
1 loop, best of 3: 6.57 s per loop
In [11]: timeit [_ for _ in swaps2(s, list_new)]
1 loop, best of 3: 8.61 s per loop
值得吗?我会说这取决于您希望样本量增加多少以及您要 运行 代码的频率。
如果代码不会 运行 频繁(例如,每小时数百次)并且样本样本量不会呈指数增长(达到 1050 或 10100) 然后我会说为了可读性。
如果随着样本量的增加,这将被极其频繁地计算,那就追求性能。
最后,我们得到了一个将枚举与 head/tail 拆分相结合的折衷方案:
def swap3(s, lst):
for i, _ in enumerate(s):
head, tail = s[:i], s[i+1:]
yield from ['%s%s%s'%(head,c,tail) for c in lst]
In [16]: timeit [_ for _ in swap3(s, list_new)]
1 loop, best of 3: 3.99 s per loop
我需要找到一种更快的方法来查找 8-11 个字符串中的交换,方法如下:
给定一个字符串 'STDILGNLYE'
,找到字母的所有单个字母交换:
list_AA = ['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M',
'F', 'P', 'S', 'T', 'W', 'Y', 'V']
即,对于字符串中的每个字母,将原始字符串中的每个字母替换为 list_aa
中的一个字母。输出将是:
ATDILGNLYE
RTDILGNLYE
NTDILGNLYE
...
SADILGNLYE
SRDILGNLYE
SNDILGNLYE
...
...
STDILGNLYV
总共 200 个新字符串(字符串中每个位置 20 个交换)。 我目前拥有的:
def _create_swaps(original_str):
list_peps = []
for i in range(len(original_str)):
for k in range(len(list_AA)):
list_peps.append(_insert_aa(original_str, i, list_aa[k]))
#remove original string
return [i for i in list_peps if i != original_str]
def _insert_aa(string, index, aa):
list_string_elements = list(string)
del list_string_elements[index]
hash_string.insert(index, aa)
return "".join(hash_string)
由于这需要重复 ~10**6 次,这是大型项目中最慢的一步。有没有一种方法可以更快地找到此类交换(通过消除 "".join
、插入、步骤/通过动态查找交换)?
供参考:
ncalls tottime percall cumtime percall filename:lineno(function)
185275200 330.286 0.000 429.295 0.000 models.py:233(_insert_aa)
975240 147.322 0.000 616.979 0.001 models.py:225(_create_swaps)
185280201/185280197 59.137 0.000 59.138 0.000 {method 'join' of 'str' objects}
185275208 39.875 0.000 39.875 0.000 {method 'insert' of 'list' objects}
975240 21.027 0.000 21.027 0.000 models.py:231(<listcomp>)
186746064 18.516 0.000 18.516 0.000 {method 'append' of 'list' objects}
这应该会更快:
def _insert_aa(string, index, aa):
return string[0:index] + aa + string[index+1:]
编辑:您只能将头部和尾部切片一次并像这样重复使用:
def generate_all_variants(string, replacements):
for i in range(len(string)):
head = string[:i]
tail = string[i+1:]
for letter in replacements:
yield head + letter + tail
for variant in generate_all_variants("abcd", ['1', '2', '3']):
print(variant)
即使您已经选择了一个答案(它不是最 pythonic),但这是您正在寻找的更清晰的版本。
你不应该使用 range 来获取可迭代对象的索引,如果你想成为 pythonic,你应该使用 enumerate。
>>> def swaps(s, lst):
... for index, _ in enumerate(s):
... for letter in lst:
... temp = list(s)
... temp[index] = letter
... yield ''.join(temp)
...
>>> list_AA = ['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V']
>>> s = 'STDILGNLYE'
>>>
>>> for _ in swaps(s, list_AA):
... print _
...
ATDILGNLYE
RTDILGNLYE
NTDILGNLYE
..........
GTDILGNLYE
HTDILGNLYE
ITDILGNLYE
此外,python3中的简单方法:
>>> def swaps(s, lst):
... for i, _ in enumerate(s):
... yield from ['%s%s%s' % (s[:i], x, s[i+1:]) for x in lst]
...
>>> swaps(s,list_AA)
<generator object swaps at 0x10c9205c8>
>>> a=_
>>> next(a)
'ATDILGNLYE'
>>> next(a)
'RTDILGNLYE'
>>> next(a)
'NTDILGNLYE'
>>> next(a)
'DTDILGNLYE'
编辑:speed/readability
上的妥协解决方案def swap3(s, lst):
for i, _ in enumerate(s):
head, tail = s[:i], s[i+1:]
yield from ['%s%s%s'%(head,c,tail) for c in lst]
下面是所有三个的基准测试:
s='STDILGNLYE'
list_AA=['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F',
'P', 'S', 'T', 'W', 'Y', 'V']
# the correct sample size
list_new = list_AA * (10**6 // len(list_AA))
def swaps0(string, replacements):
for i in range(len(string)):
head = string[:i]
tail = string[i+1:]
for letter in replacements:
yield head + letter + tail
def swaps1(s, lst):
for i, _ in enumerate(s):
yield from ['%s%s%s' % (s[:i], x, s[i+1:]) for x in lst]
def swaps2(s, lst):
for index, _ in enumerate(s):
for letter in lst:
temp = list(s)
temp[index] = letter
yield ''.join(temp)
timeit [_ for _ in swaps0(s, list_new)]
timeit [_ for _ in swaps1(s, list_new)]
timeit [_ for _ in swaps2(s, list_new)]
In [9]: timeit [_ for _ in swaps0(s, list_new)]
1 loop, best of 3: 2.61 s per loop
In [10]: timeit [_ for _ in swaps1(s, list_new)]
1 loop, best of 3: 6.57 s per loop
In [11]: timeit [_ for _ in swaps2(s, list_new)]
1 loop, best of 3: 8.61 s per loop
值得吗?我会说这取决于您希望样本量增加多少以及您要 运行 代码的频率。
如果代码不会 运行 频繁(例如,每小时数百次)并且样本样本量不会呈指数增长(达到 1050 或 10100) 然后我会说为了可读性。
如果随着样本量的增加,这将被极其频繁地计算,那就追求性能。
最后,我们得到了一个将枚举与 head/tail 拆分相结合的折衷方案:
def swap3(s, lst):
for i, _ in enumerate(s):
head, tail = s[:i], s[i+1:]
yield from ['%s%s%s'%(head,c,tail) for c in lst]
In [16]: timeit [_ for _ in swap3(s, list_new)]
1 loop, best of 3: 3.99 s per loop