如何在 R 中将 client_protocol 传递给 JDBC 驱动程序?
How to pass client_protocol to JDBC driver in R?
我正在尝试使用 dplyr.spark.hive
包与 HiveServer2
连接,但出现错误,我无法将用户名传递给 dbConnect
函数,并且可能这就是我收到有关 NULL
client_protocol
.
错误的原因
有谁知道如何解决这个问题或如何将 user/username
传递给 dbConnect
函数,驱动程序是 JDBC
?
这个 beeline
请求对我来说没问题
beeline -u "jdbc:hive2://host:port/dbname;auth=noSasl" -n mkosinski --outputformat=tsv --incremental=true -f sql_statement.sql > sql_output
但是这个 R 等价物 不:
> library(dplyr.spark.hive)
Warning: changing locked binding for ‘over’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘partial_eval’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘default_op’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Attaching package: ‘dplyr.spark.hive’
The following object is masked from ‘package:SparkR’:
cache
Warning messages:
1: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’
2: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’
> Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
> Sys.setenv(HIVE_SERVER2_THRIFT_BIND_HOST = 'tools-1.hadoop.srv')
> Sys.setenv(HIVE_SERVER2_THRIFT_PORT = '10000')
> host = 'tools-1.hadoop.srv'
> port = 10000
> driverclass = "org.apache.hive.jdbc.HiveDriver"
> Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
> library(RJDBC)
> dr = JDBC(driverclass, Sys.getenv("HADOOP_JAR"))
> #url = paste0("jdbc:hive2://", host, ":", port)
> url = paste0("jdbc:hive2://", host, ":", port,"/loghost;auth=noSasl")
> class = "Hive"
> con.class = paste0(class, "Connection") # class = "Hive"
> con = new(con.class, dbConnect(dr, url, username = "mkosinski", database = "loghost"))
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.sql.SQLException: Could not establish connection to jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{use:database=loghost})
> con = new(con.class, dbConnect(dr, url, username = "mkosinski"))
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.sql.SQLException: Could not establish connection to jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{use:database=loghost})
编辑 1
我尝试使用不同的 .jar
进行连接(如评论中所建议),看起来之前的问题已解决(我可能用错了 .jar
)但现在我收到了一个错误告诉我连接未配置:
> Sys.setenv(HADOOP_HOME="/usr/share/hadoop/share/hadoop/common/")
> Sys.setenv(HIVE_HOME = '/opt/hive/lib/')
> host = 'tools-1.hadoop.srv'
> port = 10000
> driverclass = "org.apache.hive.jdbc.HiveDriver"
> library(RJDBC)
Loading required package: DBI
Loading required package: rJava
> dr = JDBC(driverclass,classPath = c("/opt/hive/lib/hive-jdbc-1.0.0-standalone.jar"))
> dr2 = JDBC(driverclass,classPath = c("/opt/hive/lib/hive-jdbc-1.0.0-standalone.jar",
+ "/opt/hive/lib/commons-configuration-1.6.jar"))
> url = paste0("jdbc:hive2://", host, ":", port)
> dbConnect(dr, url, username = "mkosinski", database = "loghost") -> cont
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
> dbConnect(dr2, url, username = "mkosinski", database = "loghost") -> cont
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
[9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RJDBC_0.2-5 rJava_0.9-7 DBI_0.3.1
loaded via a namespace (and not attached):
[1] tools_3.1.3
问题是错误的 .jar
规范(classPath
arg in JDBC)和错误的 hiveServer2
url
解释在这里
我正在尝试使用 dplyr.spark.hive
包与 HiveServer2
连接,但出现错误,我无法将用户名传递给 dbConnect
函数,并且可能这就是我收到有关 NULL
client_protocol
.
有谁知道如何解决这个问题或如何将 user/username
传递给 dbConnect
函数,驱动程序是 JDBC
?
这个 beeline
请求对我来说没问题
beeline -u "jdbc:hive2://host:port/dbname;auth=noSasl" -n mkosinski --outputformat=tsv --incremental=true -f sql_statement.sql > sql_output
但是这个 R 等价物 不:
> library(dplyr.spark.hive)
Warning: changing locked binding for ‘over’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘partial_eval’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Warning: changing locked binding for ‘default_op’ in ‘dplyr’ whilst loading ‘dplyr.spark.hive’
Attaching package: ‘dplyr.spark.hive’
The following object is masked from ‘package:SparkR’:
cache
Warning messages:
1: replacing previous import by ‘purrr::%>%’ when loading ‘dplyr.spark.hive’
2: replacing previous import by ‘purrr::order_by’ when loading ‘dplyr.spark.hive’
> Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
> Sys.setenv(HIVE_SERVER2_THRIFT_BIND_HOST = 'tools-1.hadoop.srv')
> Sys.setenv(HIVE_SERVER2_THRIFT_PORT = '10000')
> host = 'tools-1.hadoop.srv'
> port = 10000
> driverclass = "org.apache.hive.jdbc.HiveDriver"
> Sys.setenv(HADOOP_JAR = "/opt/spark-1.5.0-bin-hadoop2.4/lib/spark-assembly-1.5.0-hadoop2.4.0.jar")
> library(RJDBC)
> dr = JDBC(driverclass, Sys.getenv("HADOOP_JAR"))
> #url = paste0("jdbc:hive2://", host, ":", port)
> url = paste0("jdbc:hive2://", host, ":", port,"/loghost;auth=noSasl")
> class = "Hive"
> con.class = paste0(class, "Connection") # class = "Hive"
> con = new(con.class, dbConnect(dr, url, username = "mkosinski", database = "loghost"))
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.sql.SQLException: Could not establish connection to jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{use:database=loghost})
> con = new(con.class, dbConnect(dr, url, username = "mkosinski"))
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.sql.SQLException: Could not establish connection to jdbc:hive2://tools-1.hadoop.srv:10000/loghost;auth=noSasl: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{use:database=loghost})
编辑 1
我尝试使用不同的 .jar
进行连接(如评论中所建议),看起来之前的问题已解决(我可能用错了 .jar
)但现在我收到了一个错误告诉我连接未配置:
> Sys.setenv(HADOOP_HOME="/usr/share/hadoop/share/hadoop/common/")
> Sys.setenv(HIVE_HOME = '/opt/hive/lib/')
> host = 'tools-1.hadoop.srv'
> port = 10000
> driverclass = "org.apache.hive.jdbc.HiveDriver"
> library(RJDBC)
Loading required package: DBI
Loading required package: rJava
> dr = JDBC(driverclass,classPath = c("/opt/hive/lib/hive-jdbc-1.0.0-standalone.jar"))
> dr2 = JDBC(driverclass,classPath = c("/opt/hive/lib/hive-jdbc-1.0.0-standalone.jar",
+ "/opt/hive/lib/commons-configuration-1.6.jar"))
> url = paste0("jdbc:hive2://", host, ":", port)
> dbConnect(dr, url, username = "mkosinski", database = "loghost") -> cont
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
> dbConnect(dr2, url, username = "mkosinski", database = "loghost") -> cont
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], :
java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
[9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RJDBC_0.2-5 rJava_0.9-7 DBI_0.3.1
loaded via a namespace (and not attached):
[1] tools_3.1.3
问题是错误的 .jar
规范(classPath
arg in JDBC)和错误的 hiveServer2
url
解释在这里