如何与 Python 中的其他文件进行文件比较

Question

我是 python 的新手 :( 我想制作：

主文件（代币）：美丽的 2 惊人的 5 无语2

包含 73 个文件的文件夹：

如何在python中编写脚本来检查源频率，例如：主文件夹中出现要计算的来源的单词： 结果例如美丽这个词出现在55个来源惊人这个词出现在 30 个来源中 speechless这个词出现在73个来源

from os import listdir

with open("C:/Users/ell/Desktop/Archivess/test/rez.txt", "w") as f:
    for filename in listdir("C:/Users/ell/Desktop/Archivess/test/sources/books/"):
       with open('C:/Users/ell/Desktop/Archivess/test/freqs/books/' + filename) as currentFile:
            text = currentFile.read()

            if ('amazing' in text):
                f.write('The word excist in the file ' + filename[:-4] + '\n')
            else:
                f.write('The word do not excist in the file' + filename[:-4] + '\n')

我写了代码，但只显示了我在 for 循环中写的单词。我怎样才能为文件做这个代码？感谢您的帮助。

Answer 1

在对每个文件进行循环之前，您应该首先读取包含您的令牌的文件以解析并将它们存储到列表（或字典，或任何您想要的）中，然后检查此列表中的任何元素是否是存在于文件中。

字典很方便，因为您可以将每个单词的频率存储为值。例如，您可以执行 {"beautiful": 0, "amazing": 0}，然后在其键出现在文件中时递增每个值。

如果您的令牌文件看起来像这样...

amazing
beautiful
speechless

你可以这样做。

with open("token_file.txt", "r") as f:
    # Creates a dict with "token" as keys and 0 as values.
    tokens = {token: 0 for token in f.readlines()}
# tokens = {"beautiful": 0, "amazing": 0, etc...}

Answer 2

你可以这样做。创建令牌列表后，遍历每个文件并增加文件中存在的令牌计数。

import os

token_file = "token.txt"
main_dir = "PATH/TO/DIR"
with open(token_file, "r") as f:
    # Creates a dict with "token" as keys and 0 as values.
    # rstrip removes \n from token
    tokens = {token.rstrip(): 0 for token in f.readlines()}

for filename in os.listdir(main_dir): 
    path = os.path.join(main_dir, filename) # path to source file
    with open(path, "r") as fp:
        text = fp.read()
        for token in tokens.keys(): # check every token
            if token in text:       # if token found in text
                tokens[token] += 1  # increment token count
print(tokens)

如何与 Python 中的其他文件进行文件比较

How to make file comparison with other files in Python

python

comparison

dataframe

python-3.x