使用递归文件搜索并使用 pathlib 排除 startswith()

Question

我想使用 pathlib 递归搜索所有文件夹中的所有文件，但我想排除以“.”开头的隐藏系统文件。（比如'.DS_Store'）但是我在pathlib中找不到类似startswith的函数。如何在pathlib中实现startswith？我知道如何使用 os.

def recursive_file_count(scan_path):
    root_directory = Path(scan_path)
    fcount = len([f for f in root_directory.glob('**/*') if f.startswith(".")])
    print(fcount)

Answer 1

startswith()是一个Python字符串方法，见https://python-reference.readthedocs.io/en/latest/docs/str/startswith.html

因为你的 f 是一个 Path 对象，你必须先通过 str(f)

将它转换成一个字符串

def recursive_file_count(scan_path):
    root_directory = Path(scan_path)
    fcount = len([f for f in root_directory.glob('**/*') if str(f).startswith(".")])
    print(fcount)

Answer 2

我的解决方案：

def recursive_file_count(scan_path):
    root_directory = Path(scan_path)
    fcount = len([f for f in root_directory.glob('**/*') if not str(f.name).startswith(".")])
    print(fcount)

Answer 3

有一种 startswith - 你可以使用 pathlib.Path.is_relative_to() :

pathlib.Path.is_relative_to() 是在 Python 3.9 中添加的，如果你想在早期版本（3.6 以上）上使用它，你需要使用 backport 路径 lib3x:

$> python -m pip install pathlib3x
$> python
>>> p = Path('/etc/passwd')
>>> p.is_relative_to('/etc')
True
>>> p.is_relative_to('/usr')
False

您可以在 github or PyPi

上找到 pathlib3x

但这对您的示例仍然没有帮助，因为您想跳过以“.”开头的文件。 - 所以你的解决方案是正确的 - 但不是很有效：

def recursive_file_count(scan_path):
    root_directory = Path(scan_path)
    fcount = len([f for f in root_directory.glob('**/*') if not str(f.name).startswith(".")])
    print(fcount)

假设您在 scan_path 中有 200 万个文件，这将创建一个包含 200 万个 pathlib.Path 对象的列表。哇，这需要一些时间和记忆...

最好有一种像 fnmatch 之类的过滤器或用于 glob 函数的东西 - 我正在考虑将它用于 pathlib3x。

Path.glob() returns a generator iterator 需要更少的内存。

所以为了节省内存，解决方法可以是：

def recursive_file_count(scan_path):
    root_directory = Path(scan_path)
    fcount = 0
    # we only have one instance of f at the time
    for f in root_directory.glob('**/*'):
        if not str(f.name).startswith(".")]):
            fcount = fcount + 1
    print(count)

^{免责声明：我是 pathlib3x 库的作者。}

使用递归文件搜索并使用 pathlib 排除 startswith()

Using recursive file search and exclude startswith() with pathlib

python

operating-system

for-loop

pathlib