Python 执行字数统计的脚本
Python script for word count execution
我有一个文件夹 "A",里面有很多文件(比如 100 个)。我想打开所有这些文件(都是文本文件)并计算单词 "virtual memory" 在所有文件中出现的次数 [总和或每个文件中出现的次数]
我尝试过类似的方法,但无法达到同样的效果。
path = 'MY_PATH'
count=0
filecount=0
files = []
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for file in f:
files.append(os.path.join(r, file))
print(files)
for fileList in files:
with open(fileList, "r") as f:
# text = f.read()
# print(len(text))
print('OPENING FILE: ',f)
for word in f:
#print(word)
if(word == 'virtual memory'):
print('WORD FOUND')
count+=1
print("COUNT : ", count)
是否有任何快速脚本可用于执行上述查询或我需要进行的一些更正?提前致谢!
您的脚本失败,因为 word
实际上是 行 。以下可以工作:
with open(fileList, "r") as f:
for sentence in f:
count += sentence.count('virtual memory')
使用file.count
计算txt
文件中短语的数量。这里有一个简单的实现方法:
import os
path = 'MY_PATH'
count= 0
for root, dirs, files in os.walk(path):
for file in files:
num=0
with open(os.path.join(root, file),"r") as f:
f_reader =f.read()
team = 'virtual memory'
num = f_reader.count(team)
count+=num
print('OPENING FILE: ',file, ' - Count:', num)
print("COUNT : ", count)
试试这个:
for r, d, f in os.walk(path):
for file in f:
files.append(os.path.join(r, file))
# print(files)
# moving this for loop outside
# previously you were visiting each file more than once
for fileList in files:
with open(fileList, "r") as f:
print('OPENING FILE: ',f)
lines = []
for line in f:
lines.extend(line.strip().split(" "))
for idx in range(len(lines)-1):
if lines[idx] == 'virtual' and lines[idx+1] == 'memory':
count += 1
print("COUNT : ", count)
您可以使用这样的模块轻松创建文件列表:
listfiles = os.listdir('path/to/files/')
然后你可以在这个列表上循环并读取整个文件,而不需要像这样循环:
count = [ ]
for file in listfiles:
with open(file) as f:
lines = f.readlines()
count.append(sum(lines == 'virtual memory')
这样,列表 count 包含每个文件中字符串 'virtual memory' 的出现次数。
你用 for word in f
做的循环是一个在线循环。当您打开一个文件时,您会在其行上进行迭代。
我有一个文件夹 "A",里面有很多文件(比如 100 个)。我想打开所有这些文件(都是文本文件)并计算单词 "virtual memory" 在所有文件中出现的次数 [总和或每个文件中出现的次数] 我尝试过类似的方法,但无法达到同样的效果。
path = 'MY_PATH'
count=0
filecount=0
files = []
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for file in f:
files.append(os.path.join(r, file))
print(files)
for fileList in files:
with open(fileList, "r") as f:
# text = f.read()
# print(len(text))
print('OPENING FILE: ',f)
for word in f:
#print(word)
if(word == 'virtual memory'):
print('WORD FOUND')
count+=1
print("COUNT : ", count)
是否有任何快速脚本可用于执行上述查询或我需要进行的一些更正?提前致谢!
您的脚本失败,因为 word
实际上是 行 。以下可以工作:
with open(fileList, "r") as f:
for sentence in f:
count += sentence.count('virtual memory')
使用file.count
计算txt
文件中短语的数量。这里有一个简单的实现方法:
import os
path = 'MY_PATH'
count= 0
for root, dirs, files in os.walk(path):
for file in files:
num=0
with open(os.path.join(root, file),"r") as f:
f_reader =f.read()
team = 'virtual memory'
num = f_reader.count(team)
count+=num
print('OPENING FILE: ',file, ' - Count:', num)
print("COUNT : ", count)
试试这个:
for r, d, f in os.walk(path):
for file in f:
files.append(os.path.join(r, file))
# print(files)
# moving this for loop outside
# previously you were visiting each file more than once
for fileList in files:
with open(fileList, "r") as f:
print('OPENING FILE: ',f)
lines = []
for line in f:
lines.extend(line.strip().split(" "))
for idx in range(len(lines)-1):
if lines[idx] == 'virtual' and lines[idx+1] == 'memory':
count += 1
print("COUNT : ", count)
您可以使用这样的模块轻松创建文件列表:
listfiles = os.listdir('path/to/files/')
然后你可以在这个列表上循环并读取整个文件,而不需要像这样循环:
count = [ ]
for file in listfiles:
with open(file) as f:
lines = f.readlines()
count.append(sum(lines == 'virtual memory')
这样,列表 count 包含每个文件中字符串 'virtual memory' 的出现次数。
你用 for word in f
做的循环是一个在线循环。当您打开一个文件时,您会在其行上进行迭代。