Python 中字符串数据的索引替换

Question

我是新手，刚开始从 YouTube 学习 Python，我正在尝试制作一个程序来替换旧的二进制数转换为新的二进制数，替换数字时遇到问题。想按索引替换我的文件（1x.txt）数据是这样的...

(01010110110111011110111101111011110101101101101011011011010101010101010101011101110101110111101)

这是一个随机数据，但它的形式是 01、011、0111 和 01111。我想将“010”替换为“0”，将“0110”替换为“00”，将“01110”替换为“000”，将“011110”替换为“0000”所以根据上面给出的数字，我的结果应该是 (0101 011011 011101111 0111101111 0111101 011011 01101 011011 01101 0101 0101 0101 0101 01110111 010111 0111101) （01 0011 0001111 00001111 00001 0011 001 0011 001 01 01 01 01 000111 0111 00001）到目前为止，我试图制作一个可以完成任务的程序，但它花费了太多时间，仅 8MB 的文件就花费了 2 小时以上所以任何人都可以建议我一个更好的方法来做同样的事情，下面提到了我的

def bytes_from_file(filename):
    newstring = ''

    old_list = ['010', '0110', '01110', '011110']
    new_list = ['0', '00', '000', '0000']

    with open(filename, "rb", buffering=200000) as f:
        while True:
            try:
                chunk = f.read()

            except:
                print('Error while file opening')
            if chunk:

                chunk2 = chunk.decode('utf-8')
                n = len(chunk2)

                i = 0
                while i < n:
                    flag = False
                    for j in range(6, 2, -1):

                        if chunk2[i:i + j] in old_list:
                            flag = True
                            index = old_list.index(chunk2[i:i + j])
                            newstring = newstring + new_list[index]

                            i = i + j

                            break
                    if flag == False:
                        newstring = newstring + chunk2[i]
                        i = i + 1
                        newstring=''.join((newstring))

            else:
                try:
                    f = open('2x.txt', "a")
                    f.write(newstring)
                    f.close()

                except:
                    print('Error While writing into file')

                break


bytes_from_file('1x.txt')

Answer 1

总的来说，你把这个问题复杂化了很多，但最重要的问题在这里：

newstring = newstring + chunk2[i]
i = i + 1
newstring=''.join((newstring))

newstring 已经是一个字符串，您可以通过重复连接子字符串（如 newstring + chunk2[i]）来构建它。这意味着 ''.join((newstring)) 将 字符串 视为可迭代对象，并通过将其拆分为 每个字母 并进行连接来将其连接起来手术。每当 old_list 不匹配时，它都会执行此操作，随着字符串变长，速度会越来越慢。 newstring=''.join((newstring))步骤实际上没有效果，但是Python无法优化出来。另一方面，使用像 newstring + chunk2[i] 这样的技术来构建字符串，会破坏 ''.join 可能具有的任何目的。

如果您的计划是构建单个字符串，您仍然希望使用 ''.join。但是您想使用它一次，并且您想在子字符串的列表上使用它：

# initially, set newstring = [] # any time you find something else to append to the output: newstring.append(whatever) # one time, right before opening the output file: newstring = ''.join(newstring)

也就是说，还有其他方法。一种有用的技术是使用 生成器 来 yield 每个需要编写的片段，而不是构建列表。然后您可以迭代编写这些，或者在编写之前构建连接的字符串（如 ''.join(my_generator_function())）。或者您可以打开两个文件，并且只 .write 每个输出块，因为您从输入中确定它。

Python 中字符串数据的索引替换

Index-wise Replacement of String Data in Python

python

indexing

binaryfiles

str-replace