os.walk Mac 和 Linux 上的文件夹顺序不同?

os.walk different folder ordering on Mac and Linux?

给定以下文件结构,

├── 0=ocfl_object_1.0
├── inventory.json
├── inventory.json.md5
├── v1
│   ├── content
│   │   ├── foo.xml
│   │   └── level1
│   │       └── level2
│   │           └── bar.txt
│   ├── inventory.json
│   └── inventory.json.md5
└── v2
    ├── content
    │   └── duck.txt
    ├── inventory.json
    └── inventory.json.md5

我想知道 python 的 os.walk 功能 returns 文件夹在 Mac 和 Linux 上的不同顺序是否可能?两者都使用 python 3.5.

Mac:

In [15]: for root,folders,files in os.walk('foo/bar'): 
    ...:     print(folders,files) 
    ...:                                                                                                                                                                                                                                                                                   
['v1', 'v2'] ['inventory.json', '0=ocfl_object_1.0', 'inventory.json.md5']
['content'] ['inventory.json', 'inventory.json.md5']
['level1'] ['foo.xml']
['level2'] []
[] ['bar.txt']
['content'] ['inventory.json', 'inventory.json.md5']
[] ['duck.txt']

在 Linux:

In [54]: for root,folders,files in os.walk('foo/bar'): 
    ...:     print(folders,files) 
    ...:                                                                                                                                                                                                                                                                                   
['v2', 'v1'] ['inventory.json.md5', 'inventory.json', '0=ocfl_object_1.0']
['content'] ['inventory.json.md5', 'inventory.json']
[] ['duck.txt']
['content'] ['inventory.json.md5', 'inventory.json']
['level1'] ['foo.xml']
['level2'] []
[] ['bar.txt']

在 Mac 的情况下,看起来好像首先遇到文件夹 v1,而在 Linux 上是 v2。关于为什么会出现这种情况的任何见解?

参见documentation on os.walk,相关部分:

Changed in version 3.5: This function now calls os.scandir() instead of os.listdir(), making it faster by reducing the number of calls to os.stat().

然后在os.scandir():

Return an iterator of os.DirEntry objects corresponding to the entries in the directory given by path. The entries are yielded in arbitrary order, and the special entries '.' and '..' are not included.

无论listdir()还是scandir(),都是任意顺序返回的

简而言之 - 订单是不可预料的。


话虽如此,你应该可以根据这部分操作循环中的dirnames

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again. Modifying dirnames when topdown is False has no effect on the behavior of the walk, because in bottom-up mode the directories in dirnames are generated before dirpath itself is generated.

因此,如果您 folders.sort() 它应该根据您的 sorted 订单工作。我刚刚试过了,它起作用了。我还加粗了关键部分 就地 - folders 必须就地排序以便 os.walk() 接受订单:

for root,folders,files in os.walk('foo/bar'): 
    folders.sort()   # <--- sort your folders to impose the order. 
    print(folders,files)