获取添加到文件中的字符列表
Get the list of characters added to a file
我有一个原始文件和另一个包含一些额外字符的文件。我正在寻找添加到该文件的字符列表。我尝试使用 difflib 但出现错误,因为可以在单词中间插入字符。
import difflib
with open('file1') as f1:
f1_text = f1.read()
with open('file2') as f2:
f2_text = f2.read()
differ = difflib.Differ()
diffs = list(differ.compare(f1_text, f2_text))
lines = list(diffs)
removed = [line[1:] for line in lines if line[0] == '-']
f = open("results", "a")
f.write(''.join(removed))
文件 1
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.
文件 2
LRorFem ipsum docdlor sit avcvcmet, consGecte5tur adiFbpiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo cocdnseqduat. Duis aute irure dolor in
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.
结果
R F c d c v c m t c o n s G e c t e 5 t r a d F p i s c i n g e l
i t , s e d d e i u s m o d t e m p o n c i d i d u n t u t
l a b o r e e t d o l o r e m a g a a . U t e n m a d m i n
i m v n i a m , u i s n o s t r u d e x e r c i t a t i o n u
l l a m c o l a b o r i s n i s i u t a l i q u i p e x e a
c o m m o d o c o c d n s e q d
预期结果:
RFdcvcvcG5Fbcdd
您只需要一次遍历每个文件一个字符
result = []
with open('file1') as file1, open('file2') as file2:
ch1, ch2 = file1.read(1), file2.read(1)
while ch1 and ch2:
if ch1 == ch2:
ch1, ch2 = file1.read(1), file2.read(1)
else:
result.append(ch2)
ch2 = file2.read(1)
print(result)
['R', 'F', 'c', 'd', 'v', 'c', 'v', 'c', 'G', '5', 'F', 'b', 'c', 'd', 'd']
我有一个原始文件和另一个包含一些额外字符的文件。我正在寻找添加到该文件的字符列表。我尝试使用 difflib 但出现错误,因为可以在单词中间插入字符。
import difflib
with open('file1') as f1:
f1_text = f1.read()
with open('file2') as f2:
f2_text = f2.read()
differ = difflib.Differ()
diffs = list(differ.compare(f1_text, f2_text))
lines = list(diffs)
removed = [line[1:] for line in lines if line[0] == '-']
f = open("results", "a")
f.write(''.join(removed))
文件 1
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
文件 2
LRorFem ipsum docdlor sit avcvcmet, consGecte5tur adiFbpiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo cocdnseqduat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
结果
R F c d c v c m t c o n s G e c t e 5 t r a d F p i s c i n g e l i t , s e d d e i u s m o d t e m p o n c i d i d u n t u t
l a b o r e e t d o l o r e m a g a a . U t e n m a d m i n i m v n i a m , u i s n o s t r u d e x e r c i t a t i o n u l l a m c o l a b o r i s n i s i u t a l i q u i p e x e a c o m m o d o c o c d n s e q d
预期结果: RFdcvcvcG5Fbcdd
您只需要一次遍历每个文件一个字符
result = []
with open('file1') as file1, open('file2') as file2:
ch1, ch2 = file1.read(1), file2.read(1)
while ch1 and ch2:
if ch1 == ch2:
ch1, ch2 = file1.read(1), file2.read(1)
else:
result.append(ch2)
ch2 = file2.read(1)
print(result)
['R', 'F', 'c', 'd', 'v', 'c', 'v', 'c', 'G', '5', 'F', 'b', 'c', 'd', 'd']