Flume 流式传输天气数据

Flume to stream in weather data

我是 flume 的新手。 但我想将天气数据从任何网站流式传输到我的 hdfs 位置。 所以我创建了接收器、源和通道...如下所示

weather.channels= memory-channel
weather.channels.memory-channel.capacity=10000
weather.channels.memory-channel.type = memory
weather.sinks = hdfs-write
weather.sinks.hdfs-write.channel=memory-channel
 weather.sinks.hdfs-write.type = logger
 weather.sinks.hdfs-write.hdfs.path = hdfs://localhost:8020/user/hadoop/flume
weather.sinks.hdfs-write.rollInterval = 1200
weather.sinks.hdfs-write.hdfs.writeFormat=Text
weather.sinks.hdfs-write.hdfs.fileType=DataStream
weather.sources= Weather
weather.sources.Weather.bind =  api.openweathermap.org/data/2.5/forecast/city?id=524901&APPID=********************************
weather.sources.Weather.channels=memory-channel
weather.sources.Weather.type = netcat
weather.sources.Weather.port = 80

所以我在这里使用 API 来处理这个问题。 我还可以使用什么来流式传输天气数据,我可以使用哪个在线网站,或者我应该使用哪个 API 来配置源? 在执行 flume-ng 命令启动代理时,我正在关注

15/03/18 11:13:28 ERROR lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner:{
 source:org.apache.flume.source.http.HTTPSource{name:Weather,state:IDLE} } - Exception follows.
java.lang.IllegalStateException: Running HTTP Server found in 
source:Weather before I started one.Will not attempt to start.
at com.google.common.base.Preconditions.checkState(Preconditions.java:145)at org.apache.flume.source.http.HTTPSource.start(HTTPSource.java:189)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) 
C15/03/18 11:13:31 INFO lifecycle.LifecycleSupervisor: Stopping lifecycle supervisor 10
15/03/18 11:13:31 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider stopping
15/03/18 11:13:31 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memory-channel stopped

您看到的 "lyfecycle" 错误是之前尝试启动 http 服务器时出错的原因。

最初的错误可能是由于尝试以非 root 用户绑定到特权 80 端口。将端口更改为 >1024,例如8080

但是,它不会像您尝试使用的那样工作。 http 或 netcat 源侦听调用,不会获取您在绑定中设置的 url。

我看到两个选项:

  1. 创建一个 linux 守护进程以定期对 url 执行 wget 或 curl,将结果保存到文件中,然后配置 flume假脱机来源。
  2. 创建您自己的 Flume 源,定期 url