如何将新文件添加到 spark 结构化流数据帧
How to add new files to spark structured streaming dataframe
我在 linux 服务器的一个文件夹中获取日常文件,我应该如何将这些添加到我的 spark 结构化流数据帧中? (增量更新)
你看过文档了吗?
File source - Reads files written in a directory as a stream of data. Supported file formats are text, csv, json, parquet. See the docs of the DataStreamReader interface for a more up-to-date list, and supported options for each file format. Note that the files must be atomically placed in the given directory, which in most file systems, can be achieved by file move operations.
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#input-sources
我在 linux 服务器的一个文件夹中获取日常文件,我应该如何将这些添加到我的 spark 结构化流数据帧中? (增量更新)
你看过文档了吗?
File source - Reads files written in a directory as a stream of data. Supported file formats are text, csv, json, parquet. See the docs of the DataStreamReader interface for a more up-to-date list, and supported options for each file format. Note that the files must be atomically placed in the given directory, which in most file systems, can be achieved by file move operations.
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#input-sources