Flume 使用 yarn-client 在集群上提交作业时无法发送事件
Flume is not able to send the event when submitting the job on cluster with yarn-client
我正在使用 Horton Works 集群(2 节点集群)来 运行 火花和 flume,所以当我 运行 使用 --master "local[*]" ,Flume 能够发送事件,Spark 也能够接收并在 localhost:4040 检查时,我可以看到正在从 flume 接收事件。 (我们使用 flume-ng-sql 源从 flume 抽取 100 Events/Sec 每个大小约为 1KB)
当我 运行 与 --master "yarn-client" 相同的示例时,我在 flume 中收到以下错误并且 spark 也没有收到任何事件。
2015-08-13 18:24:24,927 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to send events
at org.apache.flume.sink.AbstractRpcSink.process(AbstractRpcSink.java:403)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.flume.FlumeException: NettyAvroRpcClient { host: localhost, port: 55555 }: RPC connection error
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:182)
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:121)
at org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:638)
at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:88)
at org.apache.flume.sink.AvroSink.initializeRpcClient(AvroSink.java:127)
at org.apache.flume.sink.AbstractRpcSink.createConnection(AbstractRpcSink.java:222)
at org.apache.flume.sink.AbstractRpcSink.verifyConnection(AbstractRpcSink.java:283)
at org.apache.flume.sink.AbstractRpcSink.process(AbstractRpcSink.java:360)
... 3 more
Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:55555
at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:261)
at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:203)
at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:152)
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:168)
... 10 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:496)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:452)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:365)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
^
在群集中也观察到以下观察结果:
-- 使用 yarn 的内存消耗比在本地情况下使用的内存消耗要高得多。
-- 此外,当我每 30 秒抽取 100 个事件时,Flume 和 spark 能够使用 yarn-client 以及本地连接和处理相同的事件..
下面是我用于 flume 和 spark 的命令。
Flume:
sudo -u hdfs flume-ng agent --conf conf/ -f conf/flume_mysql_spark.conf -n agent1 -Dflume.root.logger=INFO,console > flumelog.txt
火花:
sudo -u hdfs spark-submit --master "yarn-client" --class "org.paladion.atm.FlumeEventCount" target/atm-1.1-jar-with-dependencies.jar > sparklog.txt
sudo -u hdfs spark-submit --master "local[*]" --class "org.paladion.atm.FlumeEventCount" target/atm-1.1-jar-with-dependencies.jar > sparklog.txt
请让我知道这里可能出了什么问题?
得到如下解决:
1 - 如果 运行 作为本地在 Flume 中给出本地机器的 IP 以及 spark.
2 - 如果 运行 作为集群(yarn-client 或 yarn-cluster)提供集群中要发送事件的机器的 IP(除了你正在执行程序的机器)可能会在 Flume 以及 spark.
中给出不是主节点的节点的 IP)机器
如果我错了请告诉我,这可能是出于其他原因,并且有更好的解决方案。
我正在使用 Horton Works 集群(2 节点集群)来 运行 火花和 flume,所以当我 运行 使用 --master "local[*]" ,Flume 能够发送事件,Spark 也能够接收并在 localhost:4040 检查时,我可以看到正在从 flume 接收事件。 (我们使用 flume-ng-sql 源从 flume 抽取 100 Events/Sec 每个大小约为 1KB)
当我 运行 与 --master "yarn-client" 相同的示例时,我在 flume 中收到以下错误并且 spark 也没有收到任何事件。
2015-08-13 18:24:24,927 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to send events
at org.apache.flume.sink.AbstractRpcSink.process(AbstractRpcSink.java:403)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.flume.FlumeException: NettyAvroRpcClient { host: localhost, port: 55555 }: RPC connection error
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:182)
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:121)
at org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:638)
at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:88)
at org.apache.flume.sink.AvroSink.initializeRpcClient(AvroSink.java:127)
at org.apache.flume.sink.AbstractRpcSink.createConnection(AbstractRpcSink.java:222)
at org.apache.flume.sink.AbstractRpcSink.verifyConnection(AbstractRpcSink.java:283)
at org.apache.flume.sink.AbstractRpcSink.process(AbstractRpcSink.java:360)
... 3 more
Caused by: java.io.IOException: Error connecting to localhost/127.0.0.1:55555
at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:261)
at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:203)
at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:152)
at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:168)
... 10 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.connect(NioClientSocketPipelineSink.java:496)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processSelectedKeys(NioClientSocketPipelineSink.java:452)
at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:365)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
^
在群集中也观察到以下观察结果: -- 使用 yarn 的内存消耗比在本地情况下使用的内存消耗要高得多。
-- 此外,当我每 30 秒抽取 100 个事件时,Flume 和 spark 能够使用 yarn-client 以及本地连接和处理相同的事件..
下面是我用于 flume 和 spark 的命令。
Flume:
sudo -u hdfs flume-ng agent --conf conf/ -f conf/flume_mysql_spark.conf -n agent1 -Dflume.root.logger=INFO,console > flumelog.txt
火花:
sudo -u hdfs spark-submit --master "yarn-client" --class "org.paladion.atm.FlumeEventCount" target/atm-1.1-jar-with-dependencies.jar > sparklog.txt
sudo -u hdfs spark-submit --master "local[*]" --class "org.paladion.atm.FlumeEventCount" target/atm-1.1-jar-with-dependencies.jar > sparklog.txt
请让我知道这里可能出了什么问题?
得到如下解决:
1 - 如果 运行 作为本地在 Flume 中给出本地机器的 IP 以及 spark.
2 - 如果 运行 作为集群(yarn-client 或 yarn-cluster)提供集群中要发送事件的机器的 IP(除了你正在执行程序的机器)可能会在 Flume 以及 spark.
中给出不是主节点的节点的 IP)机器如果我错了请告诉我,这可能是出于其他原因,并且有更好的解决方案。