Cygnus 自行关闭

Cygnus shutdown itself

我在周末进行了一些系统检查,发现 Cygnus 自行关闭,但日志文件中没有错误消息。

弗朗西斯科能否与我们分享您的想法?

非常感谢

Starting an ordered shutdown of Cygnus
Stopping sources
Starting an ordered shutdown of Cygnus
Stopping sources
Stopping http-source (lyfecycle state=START)
16/05/29 02:58:02 INFO lifecycle.LifecycleSupervisor: Stopping component: EventDrivenSourceRunner: { source:org.apache.flume.source.http.HTTPSource{name:http-source,state:START} }
16/05/29 02:58:02 INFO mortbay.log: Stopped SocketConnector@0.0.0.0:5050
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: http-source stopped
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. source.start.time == 1464330902578
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. source.stop.time == 1464490683015
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. src.append-batch.accepted == 43990
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. src.append-batch.received == 43990
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. src.append.accepted == 0
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. src.append.received == 0
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. src.events.accepted == 43990
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. src.events.received == 43990
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: SOURCE, name: http-source. src.open-connection.count == 0
16/05/29 02:58:03 INFO http.HTTPSource: Http source http-source stopped. Metrics: SOURCE:http-source{src.events.accepted=43990, src.events.received=43990, src.append.accepted=0, src.append-batch.accepted=43990, src.open-connection.count=0, src.append-batch.received=43990, src.append.received=0}
All the channels are empty
Stopping channels
Stopping ckan-channel (lyfecycle state=START)
16/05/29 02:58:03 INFO lifecycle.LifecycleSupervisor: Stopping component: org.apache.flume.channel.MemoryChannel{name: ckan-channel}
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: ckan-channel stopped
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: ckan-channel. channel.start.time == 1464330902110
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: ckan-channel. channel.stop.time == 1464490683353
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: ckan-channel. channel.capacity == 1000
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: ckan-channel. channel.current.size == 0
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: ckan-channel. channel.event.put.attempt == 43990
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: ckan-channel. channel.event.put.success == 43990
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: ckan-channel. channel.event.take.attempt == 74296
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: ckan-channel. channel.event.take.success == 43990
Stopping hdfs-channel (lyfecycle state=START)
16/05/29 02:58:03 INFO lifecycle.LifecycleSupervisor: Stopping component: org.apache.flume.channel.MemoryChannel{name: hdfs-channel}
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: hdfs-channel stopped
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: hdfs-channel. channel.start.time == 1464330902110
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: hdfs-channel. channel.stop.time == 1464490683353
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: hdfs-channel. channel.capacity == 1000
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: hdfs-channel. channel.current.size == 0
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: hdfs-channel. channel.event.put.attempt == 43990
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: hdfs-channel. channel.event.put.success == 43990
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: hdfs-channel. channel.event.take.attempt == 67985
16/05/29 02:58:03 INFO instrumentation.MonitoredCounterGroup: Shutdown Metric for type: CHANNEL, name: hdfs-channel. channel.event.take.success == 43990
Stopping sinks
Stopping ckan-sink (lyfecycle state=START)
16/05/29 02:58:03 INFO lifecycle.LifecycleSupervisor: Stopping component: SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@2c5d7ace counterGroup:{ name:null counters:{runner.backoffs.consecutive=1, runner.backoffs=30324} } }
Stopping hdfs-sink (lyfecycle state=START)
16/05/29 02:58:03 INFO lifecycle.LifecycleSupervisor: Stopping component: SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@2d298123 counterGroup:{ name:null counters:{runner.backoffs.consecutive=1, runner.backoffs=24009} } }

Cygnus 执行内部检查以搜索异常线程终止,甚至是 ctrl+c 组合键。发生这种情况时,它会关闭。可以看到相关代码here.

很可能为 enabling/disabling 这个功能设置一个标志是有用的,但目前这样的东西不存在(我会在下一个版本中添加它;))。或者,您可以编写一个 monit 进程以检测 Cygnus 关闭并自动重新启动它:

这样的 monit 可以通过专门的软件(例如 Peacemaker, maybe a load balancer 也是必需的)与高可用性 (HA) 架构相结合,以便拥有一对 active/passive Cygnus。这意味着主动 Cygnus 照常工作,而被动 Cygnus 只有在检测到主动 Cygnus 出现问题时才会开始工作。然后专用软件将所有流量重定向到被动 Cygnus,同时重新启动主动 Cygnus(通过 monit)。