连接到集群外的 Cloudera Impala/Hive

connection to Cloudera Impala / Hive outside the cluster

我正在使用 cloudera impala 服务器版本 5.4.7 首先要确保端口已打开,我已使用 telnet 对其进行了验证。

        Class.forName("org.apache.hive.jdbc.HiveDriver");
        DriverManager.setLoginTimeout(30);
try (java.sql.Connection connection = DriverManager.getConnection("jdbc:hive2://12.23.56.789:123456/someName;auth=noSasl"))
{    System.out.println("connected");      }

但是一直连接不上

我得到的只是超时错误:

可能是什么问题? 我使用的是与 cloudera 版本完全相同的配置单元版本

  [14 Apr 2016 06:27:26,797] [ERROR] [main] [org.apache.hive.jdbc.HiveConnection] - Error opening session
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
    at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
    at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
    at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
    at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:156)
    at org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:143)
    at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:475)
    at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:181)
    at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
    at java.sql.DriverManager.getConnection(Unknown Source)
    at java.sql.DriverManager.getConnection(Unknown Source)
    at com.datorama.core.service.delivery.providers.DatabaseProvider.main(DatabaseProvider.java:330)
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.fill(Unknown Source)
    at java.io.BufferedInputStream.read1(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
    ... 13 more
Exception in thread "main" java.sql.SQLException: Could not establish connection to jdbc:hive2://54.69.2.250:21050/sage_global;auth=noSasl: java.net.SocketTimeoutException: Read timed out
    at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:486)
    at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:181)
    at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
    at java.sql.DriverManager.getConnection(Unknown Source)
    at java.sql.DriverManager.getConnection(Unknown Source)
    at com.datorama.core.service.delivery.providers.DatabaseProvider.main(DatabaseProvider.java:330)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
    at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
    at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
    at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
    at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:156)
    at org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:143)
    at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:475)
    ... 5 more
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.fill(Unknown Source)
    at java.io.BufferedInputStream.read1(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
    ... 13 more

我们使用 JDBC 从集群外部进行大量查询。虽然我相信可以使用 Hive JDBC 驱动程序,但您肯定需要在 JDBC 连接字符串中设置正确的端口,可能是 Impala 的 21050。您需要确保您的主机名(或 IP 地址)指向实例 运行 Impala 守护程序(对于 Hive,您可能指向名称节点)。我的猜测是端口号不对,似乎错误只是建立top连接失败。

我们决定使用 Cloudera 为 Impala 提供的特定驱动程序,尽管这可能不是必需的。我们还设置了一个负载均衡器,因此有一个稳定的地址可以将查询定向到,而不是要求调用者选择特定的 Impala 实例。这也平均分散了负载,让我们在集群中进行更改,而无需外部调用者进行任何更改。