如何打印 .txt / .py 文件具有和不具有我与之比较的另一个 .txt / .py 文件的那些词?
How to print those words that a .txt / .py file has and does not have another .txt / .py file with which I compare it?
我曾尝试使用此代码比较 2 个 .py 代码文件,但它仅限于给我最后几行代码,例如,如果文件 1 有 2014 行,文件 2 有 2004 行,那么它 returns file1 的最后 10 行,但这不是我需要的,我需要提取那些在 file1 中但不在 file2 中的行。
import shutil
file1 = 'bot-proto7test.py'
file2 = 'bot-proto7.py'
with open(file1, 'r') as file1:
with open(file2) as file2:
with open ("output.txt", "w") as out_file:
file2.seek(0, 2)
file1.seek(file2.tell())
shutil.copyfileobj(file1, out_file)
您可以使用集合来做到这一点:
with open(file1, 'r') as f:
set1 = {*f.readlines()}
with open(file2, 'r') as f:
set2 = {*f.readlines()}
print(set1 - set2) # it contains only line that are in first file
顺便说一句。您可以使用单个 with
语句打开多个文件!
with open("f1.txt", "r") as f1, open("f2.txt", "r") as f2:
set1, set2 = {*f1.readlines()}, {*f2.readlines()}
如果我们想保留多行,我们可以使用Counter
from collections import Counter
with open(file1, 'r') as f:
c = Counter(f.readlines())
# simple substraction won't work here if first file contains more occurences than secod
res = Counter({k: v for k, v in c.items() if k not in set2})
print(list(res.elements()))
最后,如果你也想保持秩序,你需要使用原始内容:
with open(file1, 'r') as f:
original = f.readlines()
res = {*original} - set2
res = [el for el in original if el not in res]
我曾尝试使用此代码比较 2 个 .py 代码文件,但它仅限于给我最后几行代码,例如,如果文件 1 有 2014 行,文件 2 有 2004 行,那么它 returns file1 的最后 10 行,但这不是我需要的,我需要提取那些在 file1 中但不在 file2 中的行。
import shutil
file1 = 'bot-proto7test.py'
file2 = 'bot-proto7.py'
with open(file1, 'r') as file1:
with open(file2) as file2:
with open ("output.txt", "w") as out_file:
file2.seek(0, 2)
file1.seek(file2.tell())
shutil.copyfileobj(file1, out_file)
您可以使用集合来做到这一点:
with open(file1, 'r') as f:
set1 = {*f.readlines()}
with open(file2, 'r') as f:
set2 = {*f.readlines()}
print(set1 - set2) # it contains only line that are in first file
顺便说一句。您可以使用单个 with
语句打开多个文件!
with open("f1.txt", "r") as f1, open("f2.txt", "r") as f2:
set1, set2 = {*f1.readlines()}, {*f2.readlines()}
如果我们想保留多行,我们可以使用Counter
from collections import Counter
with open(file1, 'r') as f:
c = Counter(f.readlines())
# simple substraction won't work here if first file contains more occurences than secod
res = Counter({k: v for k, v in c.items() if k not in set2})
print(list(res.elements()))
最后,如果你也想保持秩序,你需要使用原始内容:
with open(file1, 'r') as f:
original = f.readlines()
res = {*original} - set2
res = [el for el in original if el not in res]