如何使用 os.walk 或 glob.glob 获取目录中所有类型的文件扩展名
How to get all type of file extensions within a directory using os.walk or glob.glob
我有一个代码可以检测目录中文件的语言。但是在提到扩展类型时,我如何检测目录中所有文件扩展名的语言(例如:- .pdf、.xlsx、.docx 等),而不仅仅是代码中提到的 .txt 文件。附上代码供参考。我想知道如何使用 glob 和 os.walk 完成此操作。
import csv
from fnmatch import fnmatch
try:
from langdetect import detect
except ImportError:
detect = lambda _: '<dunno>'
import os
rootdir = '.' # current directory
extension = '.txt'
file_pattern = '*' + extension
with open('output.csv', 'w', newline='', encoding='utf-8') as outfile:
csvwriter = csv.writer(outfile)
for dirpath, subdirs, filenames in os.walk(os.path.abspath(rootdir)):
for filename in filenames:
if fnmatch(filename, file_pattern):
lang = detect(os.path.join(dirpath, filename))
csvwriter.writerow([dirpath, filename, lang])
IIUC 您可以将 fnmatch
支票替换为
eoi = ['*.pdf', '*.xlsx', '*.docx', '*.txt'] # extensions of interest list
if any(fnmatch(file, ext) for ext in eoi):
lang = ...
我有一个代码可以检测目录中文件的语言。但是在提到扩展类型时,我如何检测目录中所有文件扩展名的语言(例如:- .pdf、.xlsx、.docx 等),而不仅仅是代码中提到的 .txt 文件。附上代码供参考。我想知道如何使用 glob 和 os.walk 完成此操作。
import csv
from fnmatch import fnmatch
try:
from langdetect import detect
except ImportError:
detect = lambda _: '<dunno>'
import os
rootdir = '.' # current directory
extension = '.txt'
file_pattern = '*' + extension
with open('output.csv', 'w', newline='', encoding='utf-8') as outfile:
csvwriter = csv.writer(outfile)
for dirpath, subdirs, filenames in os.walk(os.path.abspath(rootdir)):
for filename in filenames:
if fnmatch(filename, file_pattern):
lang = detect(os.path.join(dirpath, filename))
csvwriter.writerow([dirpath, filename, lang])
IIUC 您可以将 fnmatch
支票替换为
eoi = ['*.pdf', '*.xlsx', '*.docx', '*.txt'] # extensions of interest list
if any(fnmatch(file, ext) for ext in eoi):
lang = ...