为什么 DSE 鲨鱼试图连接到本地主机?
Why is DSE shark trying to connect to localhost?
当我 运行 运行 dse shark
查询它正在尝试连接到 localhost:9160
时突然。我不知道为什么。我的 None 个 Cassandra 实例正在监听 localhost
,但监听的是它们的 public IP 地址。
为什么会这样?
shark> SELECT COUNT(*) FROM mytable;
15/01/16 18:05:55 WARN scheduler.DAGScheduler: Creating new stage failed due to exception - job: 0
java.io.IOException: Unable to connect to server localhost:9160
at org.apache.cassandra.hadoop.ConfigHelper.createConnection(ConfigHelper.java:574)
at org.apache.cassandra.hadoop.ConfigHelper.getClientFromAddressList(ConfigHelper.java:545)
at org.apache.cassandra.hadoop.ConfigHelper.getClientFromInputAddressList(ConfigHelper.java:529)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getRangeMap(AbstractColumnFamilyInputFormat.java:330)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:125)
at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:326)
at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:264)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:31)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:31)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:31)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:31)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getShuffleMapStage(DAGScheduler.scala:228)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit.apply(DAGScheduler.scala:303)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit.apply(DAGScheduler.scala:300)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$visit(DAGScheduler.scala:300)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit.apply(DAGScheduler.scala:305)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit.apply(DAGScheduler.scala:300)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$visit(DAGScheduler.scala:300)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit.apply(DAGScheduler.scala:305)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit.apply(DAGScheduler.scala:300)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$visit(DAGScheduler.scala:300)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit.apply(DAGScheduler.scala:305)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit.apply(DAGScheduler.scala:300)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$visit(DAGScheduler.scala:300)
at org.apache.spark.scheduler.DAGScheduler.getParentStages(DAGScheduler.scala:310)
at org.apache.spark.scheduler.DAGScheduler.newStage(DAGScheduler.scala:250)
at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:531)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$$anon$$anonfun$receive.applyOrElse(DAGScheduler.scala:190)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: Failed to open transport to: localhost:9160
at com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:137)
at com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:111)
at org.apache.cassandra.hadoop.ConfigHelper.createConnection(ConfigHelper.java:569)
... 66 more
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at com.datastax.bdp.transport.client.TClientSocketFactory.openSocket(TClientSocketFactory.java:119)
at com.datastax.bdp.transport.client.TClientSocketFactory.openSocket(TClientSocketFactory.java:99)
at com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:168)
at com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:120)
... 68 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 72 more
java.lang.RuntimeException: Unable to connect to server localhost:9160
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:158)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:347)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:240)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
FAILED: Execution Error, return code -101 from shark.execution.SparkTask
附加信息:
- 我已经使用
dsetool repaircfs
修复了 CFS。
- None of
dsetool jobtracker/sparkmaster/listjt
指的是 localhost
.
- 我已经执行
dsetool sparkmaster cleanup
希望这能解决问题。
- 我在
/etc/dse
中对 localhost
进行了递归搜索,但到目前为止没有找到相关的匹配项。
- 假设:Metastore 是否以某种方式引用
localhost
是否有可能以某种方式重新创建 "Cassandra metastore"?
在 Shark conf 中将 cassandra.host 设置为您的 cassandra ip 地址,或者对于旧版本的 DSE 同时设置 cassandra.connector.host 和 cassandra.host。
当我 运行 运行 dse shark
查询它正在尝试连接到 localhost:9160
时突然。我不知道为什么。我的 None 个 Cassandra 实例正在监听 localhost
,但监听的是它们的 public IP 地址。
为什么会这样?
shark> SELECT COUNT(*) FROM mytable;
15/01/16 18:05:55 WARN scheduler.DAGScheduler: Creating new stage failed due to exception - job: 0
java.io.IOException: Unable to connect to server localhost:9160
at org.apache.cassandra.hadoop.ConfigHelper.createConnection(ConfigHelper.java:574)
at org.apache.cassandra.hadoop.ConfigHelper.getClientFromAddressList(ConfigHelper.java:545)
at org.apache.cassandra.hadoop.ConfigHelper.getClientFromInputAddressList(ConfigHelper.java:529)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getRangeMap(AbstractColumnFamilyInputFormat.java:330)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:125)
at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:326)
at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:264)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:31)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:31)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:31)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:31)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getShuffleMapStage(DAGScheduler.scala:228)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit.apply(DAGScheduler.scala:303)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit.apply(DAGScheduler.scala:300)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$visit(DAGScheduler.scala:300)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit.apply(DAGScheduler.scala:305)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit.apply(DAGScheduler.scala:300)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$visit(DAGScheduler.scala:300)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit.apply(DAGScheduler.scala:305)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit.apply(DAGScheduler.scala:300)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$visit(DAGScheduler.scala:300)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit.apply(DAGScheduler.scala:305)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$visit.apply(DAGScheduler.scala:300)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$visit(DAGScheduler.scala:300)
at org.apache.spark.scheduler.DAGScheduler.getParentStages(DAGScheduler.scala:310)
at org.apache.spark.scheduler.DAGScheduler.newStage(DAGScheduler.scala:250)
at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:531)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$$anon$$anonfun$receive.applyOrElse(DAGScheduler.scala:190)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: Failed to open transport to: localhost:9160
at com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:137)
at com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:111)
at org.apache.cassandra.hadoop.ConfigHelper.createConnection(ConfigHelper.java:569)
... 66 more
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at com.datastax.bdp.transport.client.TClientSocketFactory.openSocket(TClientSocketFactory.java:119)
at com.datastax.bdp.transport.client.TClientSocketFactory.openSocket(TClientSocketFactory.java:99)
at com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:168)
at com.datastax.bdp.transport.client.TDseClientTransportFactory.openTransport(TDseClientTransportFactory.java:120)
... 68 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 72 more
java.lang.RuntimeException: Unable to connect to server localhost:9160
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:158)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:347)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
at shark.SharkCliDriver$.main(SharkCliDriver.scala:240)
at shark.SharkCliDriver.main(SharkCliDriver.scala)
FAILED: Execution Error, return code -101 from shark.execution.SparkTask
附加信息:
- 我已经使用
dsetool repaircfs
修复了 CFS。 - None of
dsetool jobtracker/sparkmaster/listjt
指的是localhost
. - 我已经执行
dsetool sparkmaster cleanup
希望这能解决问题。 - 我在
/etc/dse
中对localhost
进行了递归搜索,但到目前为止没有找到相关的匹配项。 - 假设:Metastore 是否以某种方式引用
localhost
是否有可能以某种方式重新创建 "Cassandra metastore"?
在 Shark conf 中将 cassandra.host 设置为您的 cassandra ip 地址,或者对于旧版本的 DSE 同时设置 cassandra.connector.host 和 cassandra.host。