避免 RLE 算法中的 Python 差一错误

Question

编辑：这似乎不仅仅是一个差错，还有更多错误。

我在下面的简单算法中遇到了一个差一错误，该算法应该显示字符串中的字母数，按照 run-length encoding.

行

我明白为什么最后一个字符没有添加到结果字符串中，但是如果我增加 i 的 range，我会得到 index out of range，原因很明显。

我想从算法设计的角度了解这里的概念问题，以及让我的代码正常工作。

我是否需要一些特殊情况代码来处理原始字符串中的最后一项？或者将当前字符与 previous 字符进行比较可能更有意义，尽管这会在算法开始时造成问题？

是否有这种算法的通用方法，将当前元素与 previous/next 元素进行比较，从而避免索引超出范围问题？

def encode(text):
    # stores output string
    encoding = ""
    i = 0

    while i < len(text) - 1:
        # count occurrences of character at index i
        count = 1
        while text[i] == text[i + 1]:
            count += 1
            i += 1

        # append current character and its count to the result
        encoding += text[i] + str(count) 
        i += 1

    return encoding

text = "Hello World"
print(encode(text))
# Gives H1e1l2o1 1W1o1r1l1

Answer 1

你说得对，如果最后一个字符与前一个字符不同（在你的情况下为 d），你应该让外部循环处理最后一个字符 while i < len(text)。

您的算法在全局范围内都没有问题，但是在查找最后一个字符的出现时它会崩溃。此时，text[i+1]变为非法。

为了解决这个问题，只需在内部循环中添加一个安全检查：while i+1 < len(text)

def encode(text):
    # stores output string
    encoding = ""
    i = 0

    while i < len(text):
        # count occurrences of character at index i
        count = 1
        # FIX: check that we did not reach the end of the string 
        # while looking for occurences
        while i+1 < len(text) and text[i] == text[i + 1]:
            count += 1
            i += 1

        # append current character and its count to the result
        encoding += text[i] + str(count) 
        i += 1

    return encoding

text = "Hello World"
print(encode(text))
# Gives H1e1l2o1 1W1o1r1l1d1

Answer 2

如果你保持你的策略，你将不得不检查i+1 < len(text)。这给出了类似的东西：

def encode(text): 
    L = len(text) 
    start = 0 
    encoding = '' 
    while start < L: 
        c = text[start] 
        stop = start + 1 
        while stop < L and text[stop] == c: 
            stop += 1 
        encoding += c + str(stop - start) 
        start = stop 
    return encoding

另一种做事的方法，是记住每个运行:

的开始

def encode2(text): 
     start = 0 
     encoding = '' 
     for i,c in enumerate(text): 
         if c != text[start]: 
             encoding += text[start] + str(i-start) 
             start = i
     if text:
         encoding += text[start] + str(len(text)-start) 
     return encoding

这使您可以枚举感觉更像 pythonic 的输入。

避免 RLE 算法中的 Python 差一错误

Avoiding Python Off-by-One Error in RLE Algorithm

python

algorithm

off-by-one