如何将文件一个一个放入flume spooldir?

How to put files in flume spooldir one by one?

我正在使用 flume spooldir 将文件放入 HDFS,但我在 HDFS 中收到了很多小文件。我想过使用批量大小和滚动间隔,但我不想依赖大小和间隔。所以我决定一次将一个文件推送到 flume spooldir 中。我该怎么做?

根据https://flume.apache.org/FlumeUserGuide.html#spooling-directory-source, if you set a1.sources.src-1.fileHeader = true, then you can specify any headers (for example the file name header) in the HDFS Sink (see %{host} in the escape sequence description at https://flume.apache.org/FlumeUserGuide.html#hdfs-sink

编辑: 对于示例配置,您可以尝试以下操作:

a1.sources = r1
a1.sources.r1.type = spooldir
a1.sources.r1.channels = c1
a1.sources.r1.spoolDir = /flumespool
a1.sources.r1.basenameHeader = true

a1.channels = c1
a1.channels.c1.type = memory

a1.sinks = k1
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = /flumeout/%{basename}
a1.sinks.k1.hdfs.fileType = DataStream