文件管理的最大递归深度

Question

我必须创建一个搜索引擎，它将在包含文本文件的目录（文件夹）中搜索特定单词。

例如，假设我们在某个名为X的目录中搜索“machine”一词，我想要实现的是扫描X及其子目录中的所有txt文件。

我在调用 Python 对象时超出了最大递归深度。

import os
from pathlib import Path

def getPath (folder) :

    fpath = Path(folder).absolute()
    return fpath

def isSubdirectory (folder) :

    if folder.endswith(".txt") == False :
        return True
    else :
        return False
 
def searchEngine (folder, word) :
    
    path = getPath(folder)
    occurences = {}
    list = os.listdir (path)     #get a list of the folders/files in this path

    #assuming we only have .txt files and subdirectories in our folder :

    for k in list :

        if isSubdirectory(k) == False :
            #break case
            with open (k) as file :                  
                lines = file.readlines()

                for a in lines :

                    if a == word :
                        if str(file) not in occurences :
                            occurences[str(file)] = 1
                        else :
                            occurences[str(file)] += 1
            return occurences
                
        else :

            return searchEngine (k, word)

Answer 1

几点：

当运行你的代码时，我无法重建递归错误。但我认为你在这里有问题：list = os.listdir(path) - 这只给你 relative file/pathnames，但下面需要 absolute 那些（例如 open）一旦你在 cwd?
我认为 return 语句放错了地方：它 returns 在 first txt-file?
Python 为递归遍历路径提供了现成的解决方案：os.walk(), glob.glob() and Path.rglob()：你为什么不使用它们？
Path.absolute() 没有记录，我不会使用它。您可以改用 Path.resolve() 吗？
您在递归步骤中对返回的 occurences 不做任何操作：我认为您应该在检索后更新主词典？
不要使用 list 作为变量名 - 您正在覆盖对 built-in list().

这是 Path.rglob() 的建议：

from pathlib import Path

def searchEngine(folder, word):
    occurences = {}
    for file in Path(folder).rglob('*.txt'):
        key = str(file)
        with file.open('rt') as stream:
            for line in stream:
                count = line.count(word)
                if count:
                    if key not in occurences:
                        occurences[key] = count
                    else:
                        occurences[key] += count
    return occurences

如果你想自己实现递归，那么你可以这样做：

def searchEngine(folder, word) : 
    base = Path(folder)
    occurences = {}
    if base.is_dir():
        for path in base.iterdir():
            occurences.update(searchEngine(path, word))
    elif base.suffix == '.txt':
        with base.open('rt') as stream:
            key = str(base)
            for line in stream:
                count = line.count(word)
                if count:
                    if key not in occurences:
                        occurences[key] = count
                    else:
                        occurences[key] += count            
    return occurences

文件管理的最大递归深度

maximum recursion depth with files management

python

recursion

file