如何使用 os.walk 或任何其他替代方法按自然名称顺序递归遍历文件夹?
How can I use os.walk or any other alternative to traverse folders recursively by the natural name order?
在 python 中,如果我通过 os.walk 递归地迭代所有文件夹以找到具有定义扩展名的任何 filr。这是我现在的代码;
def get_data_paths(root_path, ext = '*.jpg'):
import os
import fnmatch
matches = []
classes = []
class_names = []
for root, dirnames, filenames in os.walk(root_path):
for filename in fnmatch.filter(filenames, ext):
matches.append(os.path.join(root, filename))
class_name = os.path.basename(os.path.dirname(os.path.join(root, filename)))
if class_name not in class_names:
class_names.append(class_name)
classes.append(class_names.index(class_name))
print "There are ",len(matches), " files're found!!"
return matches, classes, class_names
但是这里的问题是,此函数以文件夹名称的奇怪 python 顺序访问文件夹。相反,我想通过 A-Z 遍历它们。我应该如何修改此代码或使用任何其他替代方法来执行此操作?
我把代码改成了这样;
def get_data_paths(root_path, ext = '*.jpg'):
import os
import fnmatch
import natsort # import this
matches = []
classes = []
class_names = []
dir_list= natsort.natsorted(list(os.walk(root_path))) # add this
for root, dirnames, filenames in dir_list:
for filename in fnmatch.filter(filenames, ext):
matches.append(os.path.join(root, filename))
class_name = os.path.basename(os.path.dirname(os.path.join(root, filename)))
if class_name not in class_names:
class_names.append(class_name)
classes.append(class_names.index(class_name))
print "There are ",len(matches), " files're found!!"
return matches, classes, class_names
默认情况下,os.walk
的 topdown
参数是 True
,因此目录三元组在它自己的目录下降之前被报告。 The docs state:
the caller can modify the dirnames
list in-place (perhaps using del
or slice assignment), and walk()
will only recurse into the subdirectories whose names remain in dirnames
; this can be used to prune the search, impose a specific order of visiting, or even to inform walk()
about directories the caller creates or renames before it resumes walk()
again.
粗体我的。所以你需要做的就是:
for root, dirnames, filenames in os.walk(root_path):
dirnames[:] = natsort.natsorted(dirnames)
# continue with other directory processing...
由于需要就地编辑列表,因此需要使用 [:]
切片表示法。
下面是os.walk
的操作示例。给定如下所示的目录树:
$ ls -RF cm3mm/SAM3/src
Applets/ RTC.cc SAM3X/
DBGUWriteString.cc SAM3A/ SMC.cc.in
EEFC.cc SAM3N/ SoftBoot.cc
Memories.txt SAM3S/
PIO.cc SAM3U/
cm3mm/SAM3/src/Applets:
AppletAPI.cc IntFlash.cc Main.cc MessageSink.cc Runtime.cc
cm3mm/SAM3/src/SAM3A:
Map.txt Pins.txt
cm3mm/SAM3/src/SAM3N:
Map.txt Pins.txt
cm3mm/SAM3/src/SAM3S:
Map.txt Pins.txt
cm3mm/SAM3/src/SAM3U:
Map.txt Pins.txt
cm3mm/SAM3/src/SAM3X:
Map.txt Pins.txt
现在,让我们看看 os.walk
做了什么:
>>> import os
>>> for root, dirnames, filenames in os.walk("cm3mm/SAM3/src"):
... print "-----"
... print "root =", root
... print "dirnames =", dirnames
... print "filenames =", filenames
...
-----
root = cm3mm/SAM3/src
dirnames = ['Applets', 'SAM3A', 'SAM3N', 'SAM3S', 'SAM3U', 'SAM3X']
filenames = ['DBGUWriteString.cc', 'EEFC.cc', 'Memories.txt', 'PIO.cc', 'RTC.cc', 'SMC.cc.in', 'SoftBoot.cc']
-----
root = cm3mm/SAM3/src/Applets
dirnames = []
filenames = ['AppletAPI.cc', 'IntFlash.cc', 'Main.cc', 'MessageSink.cc', 'Runtime.cc']
-----
root = cm3mm/SAM3/src/SAM3A
dirnames = []
filenames = ['Map.txt', 'Pins.txt']
-----
root = cm3mm/SAM3/src/SAM3N
dirnames = []
filenames = ['Map.txt', 'Pins.txt']
-----
root = cm3mm/SAM3/src/SAM3S
dirnames = []
filenames = ['Map.txt', 'Pins.txt']
-----
root = cm3mm/SAM3/src/SAM3U
dirnames = []
filenames = ['Map.txt', 'Pins.txt']
-----
root = cm3mm/SAM3/src/SAM3X
dirnames = []
filenames = ['Map.txt', 'Pins.txt']
每次循环,你都会得到一个目录的目录和文件。我们确切地知道哪个文件属于哪个文件夹:filenames
中的文件属于文件夹root
.
在 python 中,如果我通过 os.walk 递归地迭代所有文件夹以找到具有定义扩展名的任何 filr。这是我现在的代码;
def get_data_paths(root_path, ext = '*.jpg'):
import os
import fnmatch
matches = []
classes = []
class_names = []
for root, dirnames, filenames in os.walk(root_path):
for filename in fnmatch.filter(filenames, ext):
matches.append(os.path.join(root, filename))
class_name = os.path.basename(os.path.dirname(os.path.join(root, filename)))
if class_name not in class_names:
class_names.append(class_name)
classes.append(class_names.index(class_name))
print "There are ",len(matches), " files're found!!"
return matches, classes, class_names
但是这里的问题是,此函数以文件夹名称的奇怪 python 顺序访问文件夹。相反,我想通过 A-Z 遍历它们。我应该如何修改此代码或使用任何其他替代方法来执行此操作?
我把代码改成了这样;
def get_data_paths(root_path, ext = '*.jpg'):
import os
import fnmatch
import natsort # import this
matches = []
classes = []
class_names = []
dir_list= natsort.natsorted(list(os.walk(root_path))) # add this
for root, dirnames, filenames in dir_list:
for filename in fnmatch.filter(filenames, ext):
matches.append(os.path.join(root, filename))
class_name = os.path.basename(os.path.dirname(os.path.join(root, filename)))
if class_name not in class_names:
class_names.append(class_name)
classes.append(class_names.index(class_name))
print "There are ",len(matches), " files're found!!"
return matches, classes, class_names
默认情况下,os.walk
的 topdown
参数是 True
,因此目录三元组在它自己的目录下降之前被报告。 The docs state:
the caller can modify the
dirnames
list in-place (perhaps usingdel
or slice assignment), andwalk()
will only recurse into the subdirectories whose names remain indirnames
; this can be used to prune the search, impose a specific order of visiting, or even to informwalk()
about directories the caller creates or renames before it resumeswalk()
again.
粗体我的。所以你需要做的就是:
for root, dirnames, filenames in os.walk(root_path):
dirnames[:] = natsort.natsorted(dirnames)
# continue with other directory processing...
由于需要就地编辑列表,因此需要使用 [:]
切片表示法。
下面是os.walk
的操作示例。给定如下所示的目录树:
$ ls -RF cm3mm/SAM3/src
Applets/ RTC.cc SAM3X/
DBGUWriteString.cc SAM3A/ SMC.cc.in
EEFC.cc SAM3N/ SoftBoot.cc
Memories.txt SAM3S/
PIO.cc SAM3U/
cm3mm/SAM3/src/Applets:
AppletAPI.cc IntFlash.cc Main.cc MessageSink.cc Runtime.cc
cm3mm/SAM3/src/SAM3A:
Map.txt Pins.txt
cm3mm/SAM3/src/SAM3N:
Map.txt Pins.txt
cm3mm/SAM3/src/SAM3S:
Map.txt Pins.txt
cm3mm/SAM3/src/SAM3U:
Map.txt Pins.txt
cm3mm/SAM3/src/SAM3X:
Map.txt Pins.txt
现在,让我们看看 os.walk
做了什么:
>>> import os
>>> for root, dirnames, filenames in os.walk("cm3mm/SAM3/src"):
... print "-----"
... print "root =", root
... print "dirnames =", dirnames
... print "filenames =", filenames
...
-----
root = cm3mm/SAM3/src
dirnames = ['Applets', 'SAM3A', 'SAM3N', 'SAM3S', 'SAM3U', 'SAM3X']
filenames = ['DBGUWriteString.cc', 'EEFC.cc', 'Memories.txt', 'PIO.cc', 'RTC.cc', 'SMC.cc.in', 'SoftBoot.cc']
-----
root = cm3mm/SAM3/src/Applets
dirnames = []
filenames = ['AppletAPI.cc', 'IntFlash.cc', 'Main.cc', 'MessageSink.cc', 'Runtime.cc']
-----
root = cm3mm/SAM3/src/SAM3A
dirnames = []
filenames = ['Map.txt', 'Pins.txt']
-----
root = cm3mm/SAM3/src/SAM3N
dirnames = []
filenames = ['Map.txt', 'Pins.txt']
-----
root = cm3mm/SAM3/src/SAM3S
dirnames = []
filenames = ['Map.txt', 'Pins.txt']
-----
root = cm3mm/SAM3/src/SAM3U
dirnames = []
filenames = ['Map.txt', 'Pins.txt']
-----
root = cm3mm/SAM3/src/SAM3X
dirnames = []
filenames = ['Map.txt', 'Pins.txt']
每次循环,你都会得到一个目录的目录和文件。我们确切地知道哪个文件属于哪个文件夹:filenames
中的文件属于文件夹root
.