检测 Python 序列中的所有特定子串

Detect all certain substrings in Python sequence

下面的代码演示了 Penney 的游戏 - 一个正面和反面序列先于另一个出现的概率。特别是,我想知道 while not all(i in sequence for i in [pattern1, pattern2]): 的效率,更全面地想知道 Python 中的最佳编码。这是 Python 中的合理尝试,还是更好更有效的方法。我认为我对 Python 了解得越多,我就越相信总有更好的方法!

import random

pattern1 = 'TTT'
pattern2 = 'HTT'

pattern1Wins = 0
pattern2Wins = 0

trials = 1000

for _ in range(trials):

    sequence = ''

    while not all(i in sequence for i in [pattern1, pattern2]):
        sequence += random.choice(['H','T'])

    if sequence.endswith(pattern1):
        pattern2Wins += 1
    else:
        pattern1Wins += 1

print pattern1, 'wins =', pattern1Wins
print pattern2, 'wins =', pattern2Wins

print str((max([pattern1Wins, pattern2Wins]) / float(trials) * 100)) + '%'

鉴于您只关心最后三个字符是两个子字符串之一,我会选择类似的内容:

sequence = ''
while True:
    sequence += random.choice('HT')
    if sequence.endswith(pattern1):
        pattern2Wins += 1
    elif sequence.endswith(pattern2):
        pattern1Wins += 1
    else:
        continue
    break

endswithin 更有效,因为它不会检查整个字符串是否匹配(您 已经知道不会有任何 ).


或者,您可以将 pattern1pattern2 提取到字典 {pattern: wins}:

patterns = {'TTT': 0, 'HTT': 0}

...

sequence = ''
while True:
    sequence += random.choice('HT')
    for pattern in patterns:
        if sequence.endswith(pattern):
            patterns[pattern] += 1
            break
    else:
        continue
    break

最后,使用 + 的字符串连接效率不是很高;字符串是不可变的,因此每次都会创建一个新字符串。相反,考虑将结果放入列表中,并检查它的最后三项:

sequence = []
while True:
    sequence.append(random.choice('HT'))
    if sequence[-3:] == ['H', 'T', 'T']:
        ...

最初使用三个调用 choice 创建您的序列,然后只需添加最后两个和一个新的选择循环,直到出现其中一种模式:

pattern1 = 'TTT'
pattern2 = 'HTT'
trials = 1000
d = {pattern1: 0, pattern2: 0}

for _ in range(trials):
    sequence = random.choice("HT") + random.choice("HT") + random.choice("HT")
    while sequence not in {pattern1, pattern2}:
        sequence = sequence[-2:] + random.choice("HT")
    d[sequence] += 1

print pattern1, 'wins =', d[pattern1]
print pattern2, 'wins =', d[pattern2]
print str((max([d[pattern1], d[pattern2]]) / float(trials) * 100)) + '%'

random.seed的一些时间:

In [19]: import random
In [20]: random.seed(0)
In [21]: %%timeit
   ....: pattern1 = 'TTT'
   ....: pattern2 = 'HTT'
   ....: trials = 1000
   ....: patterns = {'TTT': 0, 'HTT': 0}
   ....: for _ in range(trials):
   ....:     sequence = ''
   ....:     while True:
   ....:         sequence += random.choice('HT')
   ....:         for pattern in patterns:
   ....:             if sequence.endswith(pattern):
   ....:                 patterns[pattern] += 1
   ....:                 break
   ....:         else:
   ....:             continue
   ....:         break
   ....: 
100 loops, best of 3: 7.28 ms per loop

In [22]: %%timeit
   ....: pattern1 = 'TTT'
   ....: pattern2 = 'HTT'
   ....: trials = 1000
   ....: d = {pattern1: 0, pattern2: 0}
   ....: for _ in range(trials):
   ....:     sequence = random.choice("HT") + random.choice("HT") + random.choice("HT")
   ....:     while sequence not in {pattern1, pattern2}:
   ....:         sequence = sequence[-2:] + random.choice("HT")
   ....:     d[sequence] += 1
   ....: 
100 loops, best of 3: 4.95 ms per loop

In [23]: %%timeit
pattern1 = 'TTT'
pattern2 = 'HTT'
pattern1Wins = 0
pattern2Wins = 0
trials = 1000
for _ in range(trials):
    sequence = ''                              
    while True:                                       
        sequence += random.choice('HT')
        if sequence.endswith(pattern1):
            pattern2Wins += 1         
        elif sequence.endswith(pattern2):
            pattern1Wins += 1
        else:
            continue
        break
   ....: 
100 loops, best of 3: 6.65 ms per loop