Python 不要删除文件中的 return 墨盒和馈线

Question

我正在尝试基于 txt 文件构建 MD5 哈希。但是，有一些规则是我需要遵守的，比如：

编码规则必须是'ISO-8859-1'
所有字符必须小写
新行字符和回车 return 字符不得在哈希构建中考虑

我的文件包含 \r 和 \n 字符，这意味着换行符和 Return 墨盒。我试过使用 rstrip 和 strip 函数删除这些字符，但看起来没有用。为了确定这一点，我写了一个txt文件并在Notepad++上打开它，如下图所示，字符仍然存在。

Check the cr and lf characters in that image

我尝试了另一种解决方案：我使用 split 函数创建了一个列表，使用 \n 作为分隔符，只是为了确定这些字符是否真的在其中。和我想的一样。

我应该怎么做才能真正删除那些字符？

我试过的代码之一：

from hashlib import md5

open_file = open('N0003977.290', 'r', encoding = 'ISO-8859-1')
test_file = open('file_test.txt', 'w')
file_content = open_file.read().lower().rstrip('\n\r ').strip('\n\r')

#writing a txt file to check if there are new line characters
test_file.write(file_content)
test_file.close()

#creating a md5 hash
m = md5()
m.update(file_content.encode('ISO-8859-1'))
print(m.hexdigest())

Answer 1

我会使用 str.translate() 删除 "carriage return" 和 "line feed" 字符，如下所示：

file_content = file_content.translate({ord(ch):None for ch in '\r\n'})

或者，如果这是一项课堂作业，而我们尚未涵盖 str.translate()，我可能会做作业 "by hand":

file_content = ''.join(ch for ch in file_content if ch not in '\r\n')

完整的程序：

from hashlib import md5

open_file = open('N0003977.290', 'r', encoding = 'ISO-8859-1')
test_file = open('file_test.txt', 'w', encoding = 'ISO-8859-1')
file_content = open_file.read()

# Choose one of the following:
file_content = file_content.translate({ord(ch):None for ch in '\r\n'})
# file_content = ''.join(ch for ch in file_content if ch not in '\r\n')


#writing a txt file to check if there are new line characters
test_file.write(file_content)
test_file.close()

#creating a md5 hash
m = md5()
m.update(file_content.encode('ISO-8859-1'))
print(m.hexdigest())

Answer 2

原始文件是ISO-8859-1编码吗？

如果是，你不应该在散列之前对其进行编码，否则你应该进行编码但不要使用此编码打开文件。

rstrip 和 lstrip 不起作用，因为它只在整个内容的开头和结尾转义：

>>> '\r\nlalala\r\nlalalal\r\n'.rstrip().lstrip() 'lalala\r\nlalalal'

希望对你有帮助，

Python 不要删除文件中的 return 墨盒和馈线

Python don't remove return cartridge and feed line in file

python

hash

md5

strip