遍历多个 txt 文件并计算 Python 中所选单词的频率
Loop through multiple txt files and count frequencies of chosen word in Python
我有一个练习题,要求我编写一个循环遍历 50 个文本文件并计算每个文本文件中所选单词出现频率的函数。我目前的代码如下所示:
def count(term):
frequencies = 0
work_dir = "C:/my_work_directory"
for i in range(1, 51):
name = "chapter-{i}.txt".format(i=i)
path = os.path.join(work_dir, name)
with io.open(path, "r") as fd:
content = fd.read()
chapter = io.StringIO(content)
line = chapter.readline()
print(chapter)
while line:
lower = line.lower()
cleaned = re.sub('[^a-z ]','', lower)
words = cleaned.strip().split(' ')
for word in words:
if word == term:
frequencies += 1
line = chapter.readline()
print(frequencies)
我想要的输出是,如果我输入 count("Man"),我会得到 50 个不同频率的单词“Man”在每个文本文件中出现的频率。但是,我现在得到的只是 50 个零。我很确定这是因为我已经将变量 'frequencies' 初始化为 0,然后没有对它做任何事情。谁能帮我解决这个问题或告诉我哪里出错了?任何帮助将不胜感激,谢谢。
嗯,你的 'Man' 有一个大写字母,而你所有的单词都是小写的。所以第一件事就是在 term
变量上调用 lower()
函数。第二个错误的地方是您保持 运行 计数,而不是 per-file 计数,您稍后才会注意到。所以将频率变量的初始化移到for循环中。所以它应该看起来像这样。
def count(term):
term = term.lower()
work_dir = "C:/my_work_directory"
for i in range(1, 51):
frequencies = 0
name = "chapter-{i}.txt".format(i=i)
path = os.path.join(work_dir, name)
with io.open(path, "r") as fd:
content = fd.read()
chapter = io.StringIO(content)
line = chapter.readline()
print(chapter)
while line:
lower = line.lower()
cleaned = re.sub('[^a-z ]','', lower)
words = cleaned.strip().split(' ')
for word in words:
if word == term:
frequencies += 1
line = chapter.readline()
print(frequencies)
我运行了它,在我更改 work_dir="" 后它工作正常(因此它查看了本地)。所以我认为你应该检查工作目录路径或断言术语参数是否正确
我有一个练习题,要求我编写一个循环遍历 50 个文本文件并计算每个文本文件中所选单词出现频率的函数。我目前的代码如下所示:
def count(term):
frequencies = 0
work_dir = "C:/my_work_directory"
for i in range(1, 51):
name = "chapter-{i}.txt".format(i=i)
path = os.path.join(work_dir, name)
with io.open(path, "r") as fd:
content = fd.read()
chapter = io.StringIO(content)
line = chapter.readline()
print(chapter)
while line:
lower = line.lower()
cleaned = re.sub('[^a-z ]','', lower)
words = cleaned.strip().split(' ')
for word in words:
if word == term:
frequencies += 1
line = chapter.readline()
print(frequencies)
我想要的输出是,如果我输入 count("Man"),我会得到 50 个不同频率的单词“Man”在每个文本文件中出现的频率。但是,我现在得到的只是 50 个零。我很确定这是因为我已经将变量 'frequencies' 初始化为 0,然后没有对它做任何事情。谁能帮我解决这个问题或告诉我哪里出错了?任何帮助将不胜感激,谢谢。
嗯,你的 'Man' 有一个大写字母,而你所有的单词都是小写的。所以第一件事就是在 term
变量上调用 lower()
函数。第二个错误的地方是您保持 运行 计数,而不是 per-file 计数,您稍后才会注意到。所以将频率变量的初始化移到for循环中。所以它应该看起来像这样。
def count(term):
term = term.lower()
work_dir = "C:/my_work_directory"
for i in range(1, 51):
frequencies = 0
name = "chapter-{i}.txt".format(i=i)
path = os.path.join(work_dir, name)
with io.open(path, "r") as fd:
content = fd.read()
chapter = io.StringIO(content)
line = chapter.readline()
print(chapter)
while line:
lower = line.lower()
cleaned = re.sub('[^a-z ]','', lower)
words = cleaned.strip().split(' ')
for word in words:
if word == term:
frequencies += 1
line = chapter.readline()
print(frequencies)
我运行了它,在我更改 work_dir="" 后它工作正常(因此它查看了本地)。所以我认为你应该检查工作目录路径或断言术语参数是否正确