Dask 从目录递归读取 CSV 文件
Dask read CSV files recursively from directories
对于以下目录结构
Folder
Sub-Folder1
File1.csv
File2.csv
File3.csv
File4.csv
Sub-Folder2
File1.csv
File2.csv
Sub-Folder3
File1.csv
File2.csv
如何使用 read_csv
或 Dask
读取这些文件夹中的所有 CSV 文件,将每个文件放入一个分区?
IIUC,你可以使用:
import dask.dataframe as dd
dfs = dd.read_csv('Folder/**/*.csv')
输出:
>>> dfs
Dask DataFrame Structure:
A B C
npartitions=8
int64 int64 int64
... ... ...
... ... ... ...
... ... ...
... ... ...
Dask Name: read-csv, 8 tasks
对于以下目录结构
Folder
Sub-Folder1
File1.csv
File2.csv
File3.csv
File4.csv
Sub-Folder2
File1.csv
File2.csv
Sub-Folder3
File1.csv
File2.csv
如何使用 read_csv
或 Dask
读取这些文件夹中的所有 CSV 文件,将每个文件放入一个分区?
IIUC,你可以使用:
import dask.dataframe as dd
dfs = dd.read_csv('Folder/**/*.csv')
输出:
>>> dfs
Dask DataFrame Structure:
A B C
npartitions=8
int64 int64 int64
... ... ...
... ... ... ...
... ... ...
... ... ...
Dask Name: read-csv, 8 tasks