获取添加到文件中的字符列表

Get the list of characters added to a file

我有一个原始文件和另一个包含一些额外字符的文件。我正在寻找添加到该文件的字符列表。我尝试使用 difflib 但出现错误,因为可以在单词中间插入字符。

import difflib

with open('file1') as f1:
    f1_text = f1.read()
with open('file2') as f2:
    f2_text = f2.read()

differ = difflib.Differ()
diffs = list(differ.compare(f1_text, f2_text))

lines = list(diffs)
removed = [line[1:] for line in lines if line[0] == '-']
f = open("results", "a")
f.write(''.join(removed))

文件 1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

文件 2

LRorFem ipsum docdlor sit avcvcmet, consGecte5tur adiFbpiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo cocdnseqduat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

结果

R F c d c v c m t c o n s G e c t e 5 t r a d F p i s c i n g e l i t , s e d d e i u s m o d t e m p o n c i d i d u n t u t
l a b o r e e t d o l o r e m a g a a . U t e n m a d m i n i m v n i a m , u i s n o s t r u d e x e r c i t a t i o n u l l a m c o l a b o r i s n i s i u t a l i q u i p e x e a c o m m o d o c o c d n s e q d

预期结果: RFdcvcvcG5Fbcdd

您只需要一次遍历每个文件一个字符

result = []

with open('file1') as file1, open('file2') as file2:
    ch1, ch2 = file1.read(1), file2.read(1)
    while ch1 and ch2:
        if ch1 == ch2:
            ch1, ch2 = file1.read(1), file2.read(1)
        else:
            result.append(ch2)
            ch2 = file2.read(1)

print(result)
['R', 'F', 'c', 'd', 'v', 'c', 'v', 'c', 'G', '5', 'F', 'b', 'c', 'd', 'd']