从目录和子目录中将文件读入 pyspark 数据帧

Question

我有下面的方法来读取目录中的所有文件，但我也在努力获取子目录。我不会总是知道子目录是什么，因此无法明确定义它

有人可以给我建议吗？

df = my_spark.read.format("csv").option("header", "true").load(yesterday+"/*.csv")

Answer 1

感谢乔比

can you try giving wildcards in this way and see "path//" – Joby 23 hours ago

Answer 2

在要读取所有子目录的目录位置后使用通配符。

"path/*/*"

Reading files into a pyspark dataframe from directories and subdirectories