使用 flume 将 Twitter 数据流式传输到 Hadoop 时出错
error in streaming twitter data to Hadoop using flume
我在 Ubuntu 14.04
上使用 Hadoop-1.2.1
我正在尝试使用 Flume-1.6.0 将数据从 Twitter 流式传输到 HDFS。我已经下载了 flume-sources-1.0-SNAPSHOT.jar 并将其包含在 flume/lib 文件夹中。我在 conf/flume-env.sh 中将 flume-sources-1.0-SNAPSHOT.jar 的路径设置为 FLUME_CLASSPATH 。这是我的 flume 代理配置文件:
#setting properties of agent
Twitter-agent.sources=source1
Twitter-agent.channels=channel1
Twitter-agent.sinks=sink1
#configuring sources
Twitter-agent.sources.source1.type=com.cloudera.flume.source.TwitterSource
Twitter-agent.sources.source1.channels=channel1
Twitter-agent.sources.source1.consumerKey=<consumer-key>
Twitter-agent.sources.source1.consumerSecret=<consumer Secret>
Twitter-agent.sources.source1.accessToken=<access Toekn>
Twitter-agent.sources.source1.accessTokenSecret=<acess Token Secret>
Twitter-agent.sources.source1.keywords= morning, night, hadoop, bigdata
#configuring channels
Twitter-agent.channels.channel1.type=memory
Twitter-agent.channels.channel1.capacity=10000
Twitter-agent.channels.channel1.transactionCapacity=100
#configuring sinks
Twitter-agent.sinks.sink1.channel=channel1
Twitter-agent.sinks.sink1.type=hdfs
Twitter-agent.sinks.sink1.hdfs.path=flume/twitter/logs
Twitter-agent.sinks.sink1.rollSize=0
Twitter-agent.sinks.sink1.rollCount=1000
Twitter-agent.sinks.sink1.batchSize=100
Twitter-agent.sinks.sink1.fileType=DataStream
Twitter-agent.sinks.sink1.writeFormat=Text
当我 运行 这个代理时,我收到这样的错误:
15/06/22 14:14:49 INFO source.DefaultSourceFactory: Creating instance of source source1, type com.cloudera.flume.source.TwitterSource
15/06/22 14:14:49 ERROR node.PollingPropertiesFileConfigurationProvider: Unhandled error
java.lang.NoSuchMethodError: twitter4j.conf.Configuration.isStallWarningsEnabled()Z
at twitter4j.TwitterStreamImpl.<init>(TwitterStreamImpl.java:60)
at twitter4j.TwitterStreamFactory.<clinit>(TwitterStreamFactory.java:40)
at com.cloudera.flume.source.TwitterSource.<init>(TwitterSource.java:64)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at java.lang.Class.newInstance(Class.java:442)
at org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:44)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:322)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:97)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我的flume/lib文件夹已经有twitter4j-core-3.0.3.jar
如何纠正这个错误?
我找到了这个问题的解决方案。由于 flume-sources-1.0-SNAPSHOT.jar 和 twitter4j-stream-3.0.3.jar 包含相同的 FilterQuery.class,因此出现了 jar 冲突。所有 twitter4j-3.x.x 都使用这个 class 所以最好下载 2.2.6 版的 twitter jar(twitter4j-core,twitter4j-stream,twitter4j-media-support) 并将 3.x.x 替换为这些新下载的 jar flume/lib 目录.
运行 再次代理,twitter 数据将流式传输到 HDFS。
改变
推特-agent.sources.source1.type=com.cloudera.flume.source.TwitterSource
和
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
我在 Ubuntu 14.04
上使用 Hadoop-1.2.1我正在尝试使用 Flume-1.6.0 将数据从 Twitter 流式传输到 HDFS。我已经下载了 flume-sources-1.0-SNAPSHOT.jar 并将其包含在 flume/lib 文件夹中。我在 conf/flume-env.sh 中将 flume-sources-1.0-SNAPSHOT.jar 的路径设置为 FLUME_CLASSPATH 。这是我的 flume 代理配置文件:
#setting properties of agent
Twitter-agent.sources=source1
Twitter-agent.channels=channel1
Twitter-agent.sinks=sink1
#configuring sources
Twitter-agent.sources.source1.type=com.cloudera.flume.source.TwitterSource
Twitter-agent.sources.source1.channels=channel1
Twitter-agent.sources.source1.consumerKey=<consumer-key>
Twitter-agent.sources.source1.consumerSecret=<consumer Secret>
Twitter-agent.sources.source1.accessToken=<access Toekn>
Twitter-agent.sources.source1.accessTokenSecret=<acess Token Secret>
Twitter-agent.sources.source1.keywords= morning, night, hadoop, bigdata
#configuring channels
Twitter-agent.channels.channel1.type=memory
Twitter-agent.channels.channel1.capacity=10000
Twitter-agent.channels.channel1.transactionCapacity=100
#configuring sinks
Twitter-agent.sinks.sink1.channel=channel1
Twitter-agent.sinks.sink1.type=hdfs
Twitter-agent.sinks.sink1.hdfs.path=flume/twitter/logs
Twitter-agent.sinks.sink1.rollSize=0
Twitter-agent.sinks.sink1.rollCount=1000
Twitter-agent.sinks.sink1.batchSize=100
Twitter-agent.sinks.sink1.fileType=DataStream
Twitter-agent.sinks.sink1.writeFormat=Text
当我 运行 这个代理时,我收到这样的错误:
15/06/22 14:14:49 INFO source.DefaultSourceFactory: Creating instance of source source1, type com.cloudera.flume.source.TwitterSource
15/06/22 14:14:49 ERROR node.PollingPropertiesFileConfigurationProvider: Unhandled error
java.lang.NoSuchMethodError: twitter4j.conf.Configuration.isStallWarningsEnabled()Z
at twitter4j.TwitterStreamImpl.<init>(TwitterStreamImpl.java:60)
at twitter4j.TwitterStreamFactory.<clinit>(TwitterStreamFactory.java:40)
at com.cloudera.flume.source.TwitterSource.<init>(TwitterSource.java:64)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at java.lang.Class.newInstance(Class.java:442)
at org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:44)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:322)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:97)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我的flume/lib文件夹已经有twitter4j-core-3.0.3.jar
如何纠正这个错误?
我找到了这个问题的解决方案。由于 flume-sources-1.0-SNAPSHOT.jar 和 twitter4j-stream-3.0.3.jar 包含相同的 FilterQuery.class,因此出现了 jar 冲突。所有 twitter4j-3.x.x 都使用这个 class 所以最好下载 2.2.6 版的 twitter jar(twitter4j-core,twitter4j-stream,twitter4j-media-support) 并将 3.x.x 替换为这些新下载的 jar flume/lib 目录.
运行 再次代理,twitter 数据将流式传输到 HDFS。
改变 推特-agent.sources.source1.type=com.cloudera.flume.source.TwitterSource 和 TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource