检测 Python 序列中的所有特定子串
Detect all certain substrings in Python sequence
下面的代码演示了 Penney 的游戏 - 一个正面和反面序列先于另一个出现的概率。特别是,我想知道 while not all(i in sequence for i in [pattern1, pattern2]):
的效率,更全面地想知道 Python 中的最佳编码。这是 Python 中的合理尝试,还是更好更有效的方法。我认为我对 Python 了解得越多,我就越相信总有更好的方法!
import random
pattern1 = 'TTT'
pattern2 = 'HTT'
pattern1Wins = 0
pattern2Wins = 0
trials = 1000
for _ in range(trials):
sequence = ''
while not all(i in sequence for i in [pattern1, pattern2]):
sequence += random.choice(['H','T'])
if sequence.endswith(pattern1):
pattern2Wins += 1
else:
pattern1Wins += 1
print pattern1, 'wins =', pattern1Wins
print pattern2, 'wins =', pattern2Wins
print str((max([pattern1Wins, pattern2Wins]) / float(trials) * 100)) + '%'
鉴于您只关心最后三个字符是两个子字符串之一,我会选择类似的内容:
sequence = ''
while True:
sequence += random.choice('HT')
if sequence.endswith(pattern1):
pattern2Wins += 1
elif sequence.endswith(pattern2):
pattern1Wins += 1
else:
continue
break
endswith
比 in
更有效,因为它不会检查整个字符串是否匹配(您 已经知道不会有任何 ).
或者,您可以将 pattern1
和 pattern2
提取到字典 {pattern: wins}
:
patterns = {'TTT': 0, 'HTT': 0}
...
sequence = ''
while True:
sequence += random.choice('HT')
for pattern in patterns:
if sequence.endswith(pattern):
patterns[pattern] += 1
break
else:
continue
break
最后,使用 +
的字符串连接效率不是很高;字符串是不可变的,因此每次都会创建一个新字符串。相反,考虑将结果放入列表中,并检查它的最后三项:
sequence = []
while True:
sequence.append(random.choice('HT'))
if sequence[-3:] == ['H', 'T', 'T']:
...
最初使用三个调用 choice 创建您的序列,然后只需添加最后两个和一个新的选择循环,直到出现其中一种模式:
pattern1 = 'TTT'
pattern2 = 'HTT'
trials = 1000
d = {pattern1: 0, pattern2: 0}
for _ in range(trials):
sequence = random.choice("HT") + random.choice("HT") + random.choice("HT")
while sequence not in {pattern1, pattern2}:
sequence = sequence[-2:] + random.choice("HT")
d[sequence] += 1
print pattern1, 'wins =', d[pattern1]
print pattern2, 'wins =', d[pattern2]
print str((max([d[pattern1], d[pattern2]]) / float(trials) * 100)) + '%'
random.seed的一些时间:
In [19]: import random
In [20]: random.seed(0)
In [21]: %%timeit
....: pattern1 = 'TTT'
....: pattern2 = 'HTT'
....: trials = 1000
....: patterns = {'TTT': 0, 'HTT': 0}
....: for _ in range(trials):
....: sequence = ''
....: while True:
....: sequence += random.choice('HT')
....: for pattern in patterns:
....: if sequence.endswith(pattern):
....: patterns[pattern] += 1
....: break
....: else:
....: continue
....: break
....:
100 loops, best of 3: 7.28 ms per loop
In [22]: %%timeit
....: pattern1 = 'TTT'
....: pattern2 = 'HTT'
....: trials = 1000
....: d = {pattern1: 0, pattern2: 0}
....: for _ in range(trials):
....: sequence = random.choice("HT") + random.choice("HT") + random.choice("HT")
....: while sequence not in {pattern1, pattern2}:
....: sequence = sequence[-2:] + random.choice("HT")
....: d[sequence] += 1
....:
100 loops, best of 3: 4.95 ms per loop
In [23]: %%timeit
pattern1 = 'TTT'
pattern2 = 'HTT'
pattern1Wins = 0
pattern2Wins = 0
trials = 1000
for _ in range(trials):
sequence = ''
while True:
sequence += random.choice('HT')
if sequence.endswith(pattern1):
pattern2Wins += 1
elif sequence.endswith(pattern2):
pattern1Wins += 1
else:
continue
break
....:
100 loops, best of 3: 6.65 ms per loop
下面的代码演示了 Penney 的游戏 - 一个正面和反面序列先于另一个出现的概率。特别是,我想知道 while not all(i in sequence for i in [pattern1, pattern2]):
的效率,更全面地想知道 Python 中的最佳编码。这是 Python 中的合理尝试,还是更好更有效的方法。我认为我对 Python 了解得越多,我就越相信总有更好的方法!
import random
pattern1 = 'TTT'
pattern2 = 'HTT'
pattern1Wins = 0
pattern2Wins = 0
trials = 1000
for _ in range(trials):
sequence = ''
while not all(i in sequence for i in [pattern1, pattern2]):
sequence += random.choice(['H','T'])
if sequence.endswith(pattern1):
pattern2Wins += 1
else:
pattern1Wins += 1
print pattern1, 'wins =', pattern1Wins
print pattern2, 'wins =', pattern2Wins
print str((max([pattern1Wins, pattern2Wins]) / float(trials) * 100)) + '%'
鉴于您只关心最后三个字符是两个子字符串之一,我会选择类似的内容:
sequence = ''
while True:
sequence += random.choice('HT')
if sequence.endswith(pattern1):
pattern2Wins += 1
elif sequence.endswith(pattern2):
pattern1Wins += 1
else:
continue
break
endswith
比 in
更有效,因为它不会检查整个字符串是否匹配(您 已经知道不会有任何 ).
或者,您可以将 pattern1
和 pattern2
提取到字典 {pattern: wins}
:
patterns = {'TTT': 0, 'HTT': 0}
...
sequence = ''
while True:
sequence += random.choice('HT')
for pattern in patterns:
if sequence.endswith(pattern):
patterns[pattern] += 1
break
else:
continue
break
最后,使用 +
的字符串连接效率不是很高;字符串是不可变的,因此每次都会创建一个新字符串。相反,考虑将结果放入列表中,并检查它的最后三项:
sequence = []
while True:
sequence.append(random.choice('HT'))
if sequence[-3:] == ['H', 'T', 'T']:
...
最初使用三个调用 choice 创建您的序列,然后只需添加最后两个和一个新的选择循环,直到出现其中一种模式:
pattern1 = 'TTT'
pattern2 = 'HTT'
trials = 1000
d = {pattern1: 0, pattern2: 0}
for _ in range(trials):
sequence = random.choice("HT") + random.choice("HT") + random.choice("HT")
while sequence not in {pattern1, pattern2}:
sequence = sequence[-2:] + random.choice("HT")
d[sequence] += 1
print pattern1, 'wins =', d[pattern1]
print pattern2, 'wins =', d[pattern2]
print str((max([d[pattern1], d[pattern2]]) / float(trials) * 100)) + '%'
random.seed的一些时间:
In [19]: import random
In [20]: random.seed(0)
In [21]: %%timeit
....: pattern1 = 'TTT'
....: pattern2 = 'HTT'
....: trials = 1000
....: patterns = {'TTT': 0, 'HTT': 0}
....: for _ in range(trials):
....: sequence = ''
....: while True:
....: sequence += random.choice('HT')
....: for pattern in patterns:
....: if sequence.endswith(pattern):
....: patterns[pattern] += 1
....: break
....: else:
....: continue
....: break
....:
100 loops, best of 3: 7.28 ms per loop
In [22]: %%timeit
....: pattern1 = 'TTT'
....: pattern2 = 'HTT'
....: trials = 1000
....: d = {pattern1: 0, pattern2: 0}
....: for _ in range(trials):
....: sequence = random.choice("HT") + random.choice("HT") + random.choice("HT")
....: while sequence not in {pattern1, pattern2}:
....: sequence = sequence[-2:] + random.choice("HT")
....: d[sequence] += 1
....:
100 loops, best of 3: 4.95 ms per loop
In [23]: %%timeit
pattern1 = 'TTT'
pattern2 = 'HTT'
pattern1Wins = 0
pattern2Wins = 0
trials = 1000
for _ in range(trials):
sequence = ''
while True:
sequence += random.choice('HT')
if sequence.endswith(pattern1):
pattern2Wins += 1
elif sequence.endswith(pattern2):
pattern1Wins += 1
else:
continue
break
....:
100 loops, best of 3: 6.65 ms per loop