文件管理的最大递归深度
maximum recursion depth with files management
我必须创建一个搜索引擎,它将在包含文本文件的目录(文件夹)中搜索特定单词。
例如,假设我们在某个名为X的目录中搜索“machine”一词,我想要实现的是扫描X及其子目录中的所有txt文件。
我在调用 Python 对象时超出了最大递归深度。
import os
from pathlib import Path
def getPath (folder) :
fpath = Path(folder).absolute()
return fpath
def isSubdirectory (folder) :
if folder.endswith(".txt") == False :
return True
else :
return False
def searchEngine (folder, word) :
path = getPath(folder)
occurences = {}
list = os.listdir (path) #get a list of the folders/files in this path
#assuming we only have .txt files and subdirectories in our folder :
for k in list :
if isSubdirectory(k) == False :
#break case
with open (k) as file :
lines = file.readlines()
for a in lines :
if a == word :
if str(file) not in occurences :
occurences[str(file)] = 1
else :
occurences[str(file)] += 1
return occurences
else :
return searchEngine (k, word)
几点:
- 当 运行 你的代码时,我无法重建递归错误。但我认为你在这里有问题:
list = os.listdir(path)
- 这只给你 relative file/pathnames,但下面需要 absolute 那些(例如 open
)一旦你在 cwd
? 之外
- 我认为
return
语句放错了地方:它 returns 在 first txt-file? 之后
- Python 为递归遍历路径提供了现成的解决方案:
os.walk()
, glob.glob()
and Path.rglob()
:你为什么不使用它们?
Path.absolute()
没有记录,我不会使用它。您可以改用 Path.resolve()
吗?
- 您在递归步骤中对返回的
occurences
不做任何操作:我认为您应该在检索后更新主词典?
- 不要使用
list
作为变量名 - 您正在覆盖对 built-in list()
. 的访问
这是 Path.rglob()
的建议:
from pathlib import Path
def searchEngine(folder, word):
occurences = {}
for file in Path(folder).rglob('*.txt'):
key = str(file)
with file.open('rt') as stream:
for line in stream:
count = line.count(word)
if count:
if key not in occurences:
occurences[key] = count
else:
occurences[key] += count
return occurences
如果你想自己实现递归,那么你可以这样做:
def searchEngine(folder, word) :
base = Path(folder)
occurences = {}
if base.is_dir():
for path in base.iterdir():
occurences.update(searchEngine(path, word))
elif base.suffix == '.txt':
with base.open('rt') as stream:
key = str(base)
for line in stream:
count = line.count(word)
if count:
if key not in occurences:
occurences[key] = count
else:
occurences[key] += count
return occurences
我必须创建一个搜索引擎,它将在包含文本文件的目录(文件夹)中搜索特定单词。
例如,假设我们在某个名为X的目录中搜索“machine”一词,我想要实现的是扫描X及其子目录中的所有txt文件。
我在调用 Python 对象时超出了最大递归深度。
import os
from pathlib import Path
def getPath (folder) :
fpath = Path(folder).absolute()
return fpath
def isSubdirectory (folder) :
if folder.endswith(".txt") == False :
return True
else :
return False
def searchEngine (folder, word) :
path = getPath(folder)
occurences = {}
list = os.listdir (path) #get a list of the folders/files in this path
#assuming we only have .txt files and subdirectories in our folder :
for k in list :
if isSubdirectory(k) == False :
#break case
with open (k) as file :
lines = file.readlines()
for a in lines :
if a == word :
if str(file) not in occurences :
occurences[str(file)] = 1
else :
occurences[str(file)] += 1
return occurences
else :
return searchEngine (k, word)
几点:
- 当 运行 你的代码时,我无法重建递归错误。但我认为你在这里有问题:
list = os.listdir(path)
- 这只给你 relative file/pathnames,但下面需要 absolute 那些(例如open
)一旦你在cwd
? 之外
- 我认为
return
语句放错了地方:它 returns 在 first txt-file? 之后
- Python 为递归遍历路径提供了现成的解决方案:
os.walk()
,glob.glob()
andPath.rglob()
:你为什么不使用它们? Path.absolute()
没有记录,我不会使用它。您可以改用Path.resolve()
吗?- 您在递归步骤中对返回的
occurences
不做任何操作:我认为您应该在检索后更新主词典? - 不要使用
list
作为变量名 - 您正在覆盖对 built-inlist()
. 的访问
这是 Path.rglob()
的建议:
from pathlib import Path
def searchEngine(folder, word):
occurences = {}
for file in Path(folder).rglob('*.txt'):
key = str(file)
with file.open('rt') as stream:
for line in stream:
count = line.count(word)
if count:
if key not in occurences:
occurences[key] = count
else:
occurences[key] += count
return occurences
如果你想自己实现递归,那么你可以这样做:
def searchEngine(folder, word) :
base = Path(folder)
occurences = {}
if base.is_dir():
for path in base.iterdir():
occurences.update(searchEngine(path, word))
elif base.suffix == '.txt':
with base.open('rt') as stream:
key = str(base)
for line in stream:
count = line.count(word)
if count:
if key not in occurences:
occurences[key] = count
else:
occurences[key] += count
return occurences