我如何访问 flume-to-kafka 管道中的完整数据集?
How do i access full dataset in flume-to-kafka pipeline?
我正在读取文本文件 SMSSpamCollection 作为 flume-source,将其发布到 kafka 主题,这是一个 flume-sink。
# Agent Name:
a1.sources = r1
a1.sinks = sample
a1.channels = sample-channel
# Source configuration:
a1.sources.r1.type = exec
a1.sources.r1.command = tail -f /Users/val/Documents/code/spark/m11_to_Upload/SMSSpamCollection
a1.sources.r1.logStdErr = true
# Sink type
#a1.sinks.sample.type = logger
# Buffers events in memory to channel
a1.channels.sample-channel.type = memory
a1.channels.sample-channel.capacity = 1000
a1.channels.sample-channel.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels.selector.type = replicating
a1.sources.r1.channels = sample-channel
# Related settings Kafka, topic, and host channel where it set the source
a1.sinks.sample.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.sample.topic = sample_topic
a1.sinks.sample.brokerList = 127.0.0.1:9092
a1.sinks.sample.requiredAcks = 1
a1.sinks.sample.batchSize = 20
a1.sinks.sample.channel = sample-channel
我用这个命令
flume-ng agent --conf conf --conf-file /usr/local/Cellar/flume/1.9.0/libexec/conf/flume-sample.conf -Dflume.root.logger=DEBUG,console --name a1 -Xmx512m -Xms256m
当我从kafka主题读取数据时
kafka-console-consumer --topic sample_topic --from-beginning --bootstrap-server localhost:9092
我只看到原始文件中的最后 10 条记录。
ham Ok lor... Sony ericsson salesman... I ask shuhui then she say quite gd 2 use so i considering...
ham Ard 6 like dat lor.
ham Why don't you wait 'til at least wednesday to see if you get your .
ham Huh y lei...
spam REMINDER FROM O2: To get 2.50 pounds free call credit and details of great offers pls reply 2 this text with your valid name, house no and postcode
spam This is the 2nd time we have tried 2 contact u. U have won the £750 Pound prize. 2 claim is easy, call 087187272008 NOW1! Only 10p per minute. BT-national-rate.
ham Will ü b going to esplanade fr home?
ham Pity, * was in mood for that. So...any other suggestions?
ham The guy did some bitching but I acted like i'd be interested in buying something else next week and he gave it to us for free
ham Rofl. Its true to its name
查看所有记录的正确方法是什么?
您正在使用 tail
,它默认显示文件的最后 10 行。
改为使用:
a1.sources.r1.command = tail -c +0 -f /Users/val/Documents/code/spark/m11_to_Upload/SMSSpamCollection
-c +0
告诉 tail
从文件的第一个字符开始。
顺便说一句,另一种方法是将 Kafka Connect 与诸如 Spooldir or File Pulse 插件之类的东西一起使用。
我正在读取文本文件 SMSSpamCollection 作为 flume-source,将其发布到 kafka 主题,这是一个 flume-sink。
# Agent Name:
a1.sources = r1
a1.sinks = sample
a1.channels = sample-channel
# Source configuration:
a1.sources.r1.type = exec
a1.sources.r1.command = tail -f /Users/val/Documents/code/spark/m11_to_Upload/SMSSpamCollection
a1.sources.r1.logStdErr = true
# Sink type
#a1.sinks.sample.type = logger
# Buffers events in memory to channel
a1.channels.sample-channel.type = memory
a1.channels.sample-channel.capacity = 1000
a1.channels.sample-channel.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels.selector.type = replicating
a1.sources.r1.channels = sample-channel
# Related settings Kafka, topic, and host channel where it set the source
a1.sinks.sample.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.sample.topic = sample_topic
a1.sinks.sample.brokerList = 127.0.0.1:9092
a1.sinks.sample.requiredAcks = 1
a1.sinks.sample.batchSize = 20
a1.sinks.sample.channel = sample-channel
我用这个命令
flume-ng agent --conf conf --conf-file /usr/local/Cellar/flume/1.9.0/libexec/conf/flume-sample.conf -Dflume.root.logger=DEBUG,console --name a1 -Xmx512m -Xms256m
当我从kafka主题读取数据时
kafka-console-consumer --topic sample_topic --from-beginning --bootstrap-server localhost:9092
我只看到原始文件中的最后 10 条记录。
ham Ok lor... Sony ericsson salesman... I ask shuhui then she say quite gd 2 use so i considering...
ham Ard 6 like dat lor.
ham Why don't you wait 'til at least wednesday to see if you get your .
ham Huh y lei...
spam REMINDER FROM O2: To get 2.50 pounds free call credit and details of great offers pls reply 2 this text with your valid name, house no and postcode
spam This is the 2nd time we have tried 2 contact u. U have won the £750 Pound prize. 2 claim is easy, call 087187272008 NOW1! Only 10p per minute. BT-national-rate.
ham Will ü b going to esplanade fr home?
ham Pity, * was in mood for that. So...any other suggestions?
ham The guy did some bitching but I acted like i'd be interested in buying something else next week and he gave it to us for free
ham Rofl. Its true to its name
查看所有记录的正确方法是什么?
您正在使用 tail
,它默认显示文件的最后 10 行。
改为使用:
a1.sources.r1.command = tail -c +0 -f /Users/val/Documents/code/spark/m11_to_Upload/SMSSpamCollection
-c +0
告诉 tail
从文件的第一个字符开始。
顺便说一句,另一种方法是将 Kafka Connect 与诸如 Spooldir or File Pulse 插件之类的东西一起使用。