每小时将推文保存到单个 Flume 数据文件的 flume.conf 参数应该是多少?

What should be flume.conf parametres for save tweets to single FlumeData file per hour?

我们正在按 /user/flume/2016/06/28/13/FlumeData 这样的目录顺序保存推文...。但它每小时创建超过 100 个 FlumeData file.I 已更改 TwitterAgent.sinks.HDFS.hdfs.rollSize = 52428800 (50 mb) 同样的事情发生了 again.After 我也尝试过更改 rollcount 参数但没有 work.How 我可以设置参数以每小时获取一个 FlumeData 文件吗?

rollInterval呢?你把它设置为零了吗?如果是,那么问题可能出在其他地方。如果 rollInterval 设置为某个值,它会覆盖 rollSizerollCount 值。文件轮换可能会在文件大小达到 rollSize 值之前发生。另外,检查您设置的 HDFS 块大小。如果设置为太小的值,即使这样也可能导致文件滚动。

试试这个 -

    TwitterAgent.sinks.HDFS.channel = MemChannel
    TwitterAgent.sinks.HDFS.type = hdfs
    TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hpc01:8020/user/flume/tweets/%Y/%m/%d/%H
    TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
    TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

    TwitterAgent.sinks.HDFS.hdfs.batchSize = 100


    TwitterAgent.sinks.HDFS.hdfs.rollSize = 0

    TwitterAgent.sinks.HDFS.hdfs.rollCount = 0

    TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600
    TwitterAgent.channels.MemChannel.type = memory
    TwitterAgent.channels.MemChannel.capacity = 1000

    TwitterAgent.channels.MemChannel.transactionCapacity = 100
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hpc01:8020/user/flume/tweets/%Y/%m/%d/%H
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

TwitterAgent.sinks.HDFS.hdfs.batchSize = 1


TwitterAgent.sinks.HDFS.hdfs.rollSize = 0

TwitterAgent.sinks.HDFS.hdfs.rollCount = 10

TwitterAgent.sinks.HDFS.hdfs.rollInterval = 0
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000

TwitterAgent.channels.MemChannel.transactionCapacity = 1000

我通过设置 rollInterval=3600 rollcount=0 和 batchSize=100 flume.conf 参数解决了这个问题,正如@vkgade 建议的