避免在运行时对 HDP 进行 mapred.child.env 修改,以便 R 可以使用 RHive 建立与 hiveserver2 的连接

Avoiding mapred.child.env modification at runtime on HDP so that R can establish connection to hiveserver2 using RHive

我正在尝试让 R 的 RHive 包与 hiveserver2 很好地通信。

我在尝试使用以下方法连接到 hiveserver2 时收到错误消息:

>rhive.connect(host="localhost",port=10000, hiveServer2=TRUE, user="root", password="hadoop")

初始运行的输出:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/client/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/client/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/03/19 07:08:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/19 07:08:23 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
15/03/19 07:08:24 INFO jdbc.Utils: Supplied authorities: localhost:10000
15/03/19 07:08:24 INFO jdbc.Utils: Resolved authority: localhost:10000
15/03/19 07:08:24 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10000/default

这会导致错误:

错误:org.apache.hive.service.cli.HiveSQLException:处理语句时出错:无法在 运行 时修改 mapred.child.env。它不在允许在 运行time

修改的参数列表中

在同一命令的后续 运行 秒中,输出减少为:

15/03/19 07:16:24 INFO jdbc.Utils: Supplied authorities: localhost:10000
15/03/19 07:16:24 INFO jdbc.Utils: Resolved authority: localhost:10000
15/03/19 07:16:24 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10000/default
Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: Cannot modify mapred.child.env at runtime. It is not in list of params that are allowed to be modified at runtime

这向我表明我可能在某处没有足够的权限...但是,我 运行正在使用 root 执行此操作。所以,我不确定我缺少什么权限...

我已经按照安装指南安装了 RHive via README

注意:如果我使用 CRAN version of the package.

也会出现同样的错误

我目前正在使用 Hortonworks Data Platform 2.2 (HDP 2.2) 的虚拟框图像。结果,hadoop 和 hiveserver2 已经安装好了。我已经安装了 R 版本 3.1.2。

以下是我安装 RHive 的方式:

# Set up paths for HIVE_HOME, HADOOP_HOME, and HADOOP_CONF
export HIVE_HOME=/usr/hdp/2.2.0.0-2041/hive

export HADOOP_HOME=/usr/hdp/2.2.0.0-2041/hadoop

export HADOOP_CONF_DIR=/etc/hadoop/conf

# R Location via RHOME
R_HOME=/usr/lib64/R

# Place R_HOME into hadoop config location
sudo sh -c "echo \"R_HOME='$R_HOME'\" >> $HADOOP_HOME/conf/hadoop-env.sh"

# Add remote enable to Rserve config.
sudo sh -c "echo 'remote enable' >> /etc/Rserv.conf"

# Launch the daemon
R CMD Rserve

# Confirm launch
netstat -nltp

# Install ant to build java files
sudo yum -y install ant

# Install package dependencies
sudo R --no-save << EOF
install.packages( c('rJava','Rserve','RUnit'), repos='http://cran.us.r-project.org', INSTALL_opts=c('--byte-compile') )
EOF

# Install RHive package
git clone https://github.com/nexr/RHive.git
cd RHive
ant build
sudo R CMD INSTALL RHive

要检查打开 R 并使用 EOF 之间的语句,或者只是 运行 直接来自 shell 的命令:

sudo R --no-save << EOF
Sys.setenv(HIVE_HOME="/usr/hdp/2.2.0.0-2041/hive")
Sys.setenv(HADOOP_HOME="/usr/hdp/2.2.0.0-2041/hadoop")
Sys.setenv(HADOOP_CONF_DIR="/etc/hadoop/conf")
library(RHive)
rhive.connect(host="localhost",port=10000, hiveServer2=TRUE, user="root", password="hadoop")
EOF

这里link提到了答案。

基本上,您必须在 /etc/hive/conf/hive-site.xml

中添加值为 "mapred.child.env" 的 属性 "hive.security.authorization.sqlstd.confwhitelist.append"

这个解决方案对我有用,但我使用 Ambari UI 来更改此配置。