在 Python 中的字符串中查找最长 运行 的从零开始的索引的函数

Function that finds the zero-based index of the longest run in a string in Python

我正在尝试编写一个函数来查找字符串中最长 运行 的从零开始的索引。如果有多个 运行 具有相同的长度,代码应该 return 第一个的索引。

a=["a","b","b","c","c","c","d","d","d","d","c","c","c","b","b","a"]

def longestrun(myList):
    result = None
    prev = None
    size = 0
    max_size = 0


    for i in myList:
        if i == prev:
            print (i)
            size += 1
            if size > max_size:
                print ('*******  '+ str(max_size))
                max_size = size 
        else:
            size = 0
        prev = i
    print (max_size+1)    
    return max_size+1


longestrun(a)

我做了一些研究并找到了这段代码,我认为它可以用来在我的列表中找到最长的 运行,但我不知道如何使用它来找到第一个字母的索引最长的运行。任何人都可以帮助我或给我一些关于如何做到这一点的建议吗?总体而言,程序为 运行 时的输出应产生数字 6,因为第一个 'd' 位于索引 6,并且是最长的 运行.

请注意,我是初学者,所以如果答案尽可能简单并加以解释,我们将不胜感激。

如果要最长字符串的起始索引:

from operator import itemgetter
def longest(l):
    od = defaultdict(int)
    prev = None
    out = []
    for ind, ele in enumerate(l):
        if ele != prev and prev in od:
            out.append((ind, prev, od[prev]))
            od[prev] = 0
        od[ele] += 1
        prev = ele
    best = max(out, key=itemgetter(2)) # max by sequence length
    return best[0] - best[2] # deduct last index from length to get start
print(longest(a))

我存储了所有的密钥和长度,以防你真的想知道所有的信息。

没有进口:

def longest1(l):
    prev = None
    seq = 0 
    best = 0
    indx = None 
    for ind, ele in enumerate(l):
        if ele != prev: # if we have a new char we have a new sequence
             # if current seq len is greater than our current best 
            if seq > best: 
                # update best to current len and set index to start of the sequence
                best = seq
                indx  = ind - seq
            seq = 0 # reset seq count
        seq += 1
        prev = ele
    return indx 
print(longest(a))

一些时间显示简单循环实际上是最有效的:

In [23]: timeit longestrun_index(a)
100000 loops, best of 3: 9.07 µs per loop

In [24]: timeit longestrun(a)
100000 loops, best of 3: 2.54 µs per loop

In [25]: timeit longest(a)
100000 loops, best of 3: 6.79 µs per loop

In [26]: timeit longest1(a)
100000 loops, best of 3: 3.06 µs per loop

使用 defaultdict 创建一个包含每个项目计数的字典,然后找到具有最高值的键,然后找到该项目的第一次出现。

from collections import defaultdict
import operator

letters=["a","b","b","c","c","c","d","d","d","d","c","c","c","b","b","a"]

d = defaultdict(int)
for letter in letters:
    d[letter] += 1

highest_run = max(d.iteritems(), key=operator.itemgetter(1))[0]

z_index =''.join(letters).find(highest_run)
print z_index

使用模块的好处是开发简单高效;加上重用维护良好和测试良好的代码所带来的 "standing on the shoulders of giants" 效果。这并不是说您在使用模块时不应该小心检查它们是否维护良好并进行单元测试。

您可以使用 itertools.groupby 获得 运行 的列表,然后您只需找到最大值 运行 并将所有前面的 运行 的长度相加小号:

from itertools import groupby

a = ["a","b","b","c","c","c","d","d","d","d","c","c","c","b","b","a"]

# Get list of runs, each in the form (character, length)
runs = [(x, len(list(y))) for x,y in groupby(a)]

# Identify longest run
maxrun = max(runs, key=lambda x: x[1])

# Sum length of all runs before the max
index = 0
for run in runs:
    if run == maxrun: break
    index += run[1]

print(index)

您可以为此使用 itertools.groupby() with max() and enumerate()

from itertools import groupby
from operator import itemgetter

def longestrun_index(seq):
    groups = ((next(g), sum(1 for _ in g)+1) for k, g in groupby(enumerate(seq),
                                                             key=itemgetter(1)))
    (index, item), length = max(groups, key=itemgetter(1))
    return index

a = ["a","b","b","c","c","c","d","d","d","d","c","c","c","b","b","a"]    
print (longestrun_index(a))
# 6

这是如何工作的?

  • 我们首先使用 itertools.groupbyenumerate(a) 将相似的项目分组。但是由于 enumerate(a) 将 return 索引以及列表 a 中的项目((索引,项目)元组),我们需要告诉 groupby 使用该项目进行分组东西,为此我在 groupby().
  • 中使用了 operator.itemgetter(1)
  • 现在groupby()return两个item,我们用来分组的item key item和iterator形式的groups。现在我们可以使用此迭代器(组)通过在迭代器上调用 next 来获取第一个项目和索引,然后使用 sum() 获取该组中所有项目的总数生成器表达式:sum(1 for _ in g)+1。 +1 是为了补偿我们之前使用 next() 从该组中获取的项目。

  • 使用索引、键和计数,我们现在有了生成器,它将在迭代时产生 ((index, key), length)

  • 现在我们可以再次简单地使用内置函数max()和itemgetter来指定要使用哪个项目进行比较(这里是length)并找到所需的索引。

这应该没问题:

def longestrun(myList):
    prev = None
    size = 0
    max_size = 0
    curr_pos = 0
    max_pos = 0

    for (index, i) in enumerate(myList):
        if i == prev:
            size += 1
            if size > max_size:
                max_size = size 
                max_pos = curr_pos
        else:
            size = 0
            curr_pos = index
        prev = i
    return max_pos