Cask CDAP 服务已启动,但在安装期间未 运行
Cask CDAP services started, but not running during installation
在浏览了在 MapR 系统 (v6.0) 上安装 CDAP 并启动 cdap 服务的文档后,我发现一些 CDAP 服务在启动后 运行 (https://docs.cask.co/cdap/current/en/admin-manual/installation/mapr.html#starting-cdap-services) 尽管服务的启动循环没有显示任何错误。启动服务并检查其状态后的输出如下所示:
[root@mapr007 conf]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i start ; done
/usr/bin/id: cannot find name for group ID 504
Wed Nov 21 16:03:01 HST 2018 Starting CDAP Auth Server service on mapr007.org.local
/usr/bin/id: cannot find name for group ID 504
Wed Nov 21 16:03:04 HST 2018 Starting CDAP Kafka Server service on mapr007.org.local
/usr/bin/id: cannot find name for group ID 504
Wed Nov 21 16:03:07 HST 2018 Starting CDAP Master service on mapr007.org.local
Warning: Unable to determine $DRILL_HOME
Wed Nov 21 16:03:48 HST 2018 Ensuring required HBase coprocessors are on HDFS
Wed Nov 21 16:04:00 HST 2018 Running CDAP Master startup checks -- this may take a few minutes
/usr/bin/id: cannot find name for group ID 504
Wed Nov 21 16:04:15 HST 2018 Starting CDAP Router service on mapr007.org.local
/usr/bin/id: cannot find name for group ID 504
Wed Nov 21 16:04:17 HST 2018 Starting CDAP UI service on mapr007.org.local
[root@mapr007 conf]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i status ; done
/usr/bin/id: cannot find name for group ID 504
PID file /var/cdap/run/auth-server-cdap.pid exists, but process 12126 does not appear to be running
/usr/bin/id: cannot find name for group ID 504
CDAP Kafka Server running as PID 12653
/usr/bin/id: cannot find name for group ID 504
PID file /var/cdap/run/master-cdap.pid exists, but process 15789 does not appear to be running
/usr/bin/id: cannot find name for group ID 504
CDAP Router running as PID 16184
/usr/bin/id: cannot find name for group ID 504
CDAP UI running as PID 16308
请注意,虽然存在 "Unable to determine $DRILL_HOME" 错误,但我不认为这应该是一个大问题,因为在 cdap-[=49 中添加并设置了 explore.enabled
值=] 为假。
查看 cdap-site.xml,web UI 端口似乎设置为默认的 11011 但看不到它(如果只是检查 UI 是否会告诉我更多关于任何错误的信息)尽管它报告为 运行.
正在检查有关 PID 的一些信息,看到
# looking at the process that report to not be running
[root@mapr007 conf.dist]# ps -p 12126
PID TTY TIME CMD
[root@mapr007 conf.dist]# ps -p 15789
PID TTY TIME CMD
# looking at the rest of the processes
[root@mapr007 conf.dist]# ps -p 12653
PID TTY TIME CMD
12653 ? 00:08:12 java
[root@mapr007 conf.dist]# ps -p 16184
PID TTY TIME CMD
16184 ? 00:03:02 java
[root@mapr007 conf.dist]# ps -p 16308
PID TTY TIME CMD
16308 ? 00:00:01 node
还检查默认值 security.auth.server.bind.port
是否被其他服务
使用
root@mapr007 conf.dist]# netstat -anp | grep 10009
但未检测到任何内容。
不确定从哪里开始调试,如有任何建议或信息,我们将不胜感激。
更新
正在重新启动服务以尝试获取更多日志记录数据,现在看到了一些错误(我想比只是不抱怨然后不工作要好)
[root@mapr007 conf.dist]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i stop ; done
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:29 HST 2018 Stopping CDAP Auth Server ...
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:29 HST 2018 Stopping CDAP Kafka Server ....
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:30 HST 2018 Stopping CDAP Master ...
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:31 HST 2018 Stopping CDAP Router ....
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:32 HST 2018 Stopping CDAP UI ....
[root@mapr007 conf.dist]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i start ; done
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:41 HST 2018 Starting CDAP Auth Server service on mapr007.org.local
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:44 HST 2018 Starting CDAP Kafka Server service on mapr007.org.local
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:47 HST 2018 Starting CDAP Master service on mapr007.org.local
Warning: Unable to determine $DRILL_HOME
Mon Nov 26 11:07:17 HST 2018 Ensuring required HBase coprocessors are on HDFS
Mon Nov 26 11:08:57 HST 2018 Running CDAP Master startup checks -- this may take a few minutes
[ERROR] Master startup checks failed. Please check /var/log/cdap/master-cdap-mapr007.org.local.log to address issues.
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:10:08 HST 2018 Starting CDAP Router service on mapr007.org.local
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:10:11 HST 2018 Starting CDAP UI service on mapr007.org.local
查看/var/log/cdap/master-cdap-mapr007.org.local.log文件的内容,最下面可以看到
...
...
...
2018-11-26 11:10:06,996 - ERROR [main:c.c.c.m.s.MasterStartupTool@109] - YarnCheck failed with RuntimeException: Unable to get status of YARN nodemanagers. Please check that YARN is running and that the correct Hadoop configuration (core-site.xml, yarn-site.xml) and libraries are included in the CDAP master classpath.
java.lang.RuntimeException: Unable to get status of YARN nodemanagers. Please check that YARN is running and that the correct Hadoop configuration (core-site.xml, yarn-site.xml) and libraries are included in the CDAP master classpath.
at co.cask.cdap.master.startup.YarnCheck.run(YarnCheck.java:79) ~[co.cask.cdap.cdap-master-5.1.0.jar:na]
at co.cask.cdap.common.startup.CheckRunner.runChecks(CheckRunner.java:51) ~[co.cask.cdap.cdap-common-5.1.0.jar:na]
at co.cask.cdap.master.startup.MasterStartupTool.canStartMaster(MasterStartupTool.java:106) [co.cask.cdap.cdap-master-5.1.0.jar:na]
at co.cask.cdap.master.startup.MasterStartupTool.main(MasterStartupTool.java:96) [co.cask.cdap.cdap-master-5.1.0.jar:na]
Caused by: java.util.concurrent.TimeoutException: null
at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[na:1.8.0_181]
at co.cask.cdap.master.startup.YarnCheck.run(YarnCheck.java:76) ~[co.cask.cdap.cdap-master-5.1.0.jar:na]
... 3 common frames omitted
2018-11-26 11:10:07,006 - ERROR [main:c.c.c.m.s.MasterStartupTool@113] - Root cause: TimeoutException:
2018-11-26 11:10:07,006 - ERROR [main:c.c.c.m.s.MasterStartupTool@116] - Errors detected while starting up master. Please check the logs, address all errors, then try again.
遵循“分布式 CDAP 上的 CDAP 服务由于异常而未启动。我应该怎么办?”文档中的常见问题解答似乎没有帮助(https://docs.cask.co/cdap/current/en/faqs/cdap.html#cdap-services-on-distributed-cdap-aren-t-starting-up-due-to-an-exception-what-should-i-do).
将继续调试,但希望对这些新错误提出任何意见。
重启集群上的资源管理器和节点管理器服务似乎解决了这个错误。这主要是根据另一个开发人员的猜测完成的,该猜测仅基于错误与 CDAP 无法连接到 YARN 有关的事实,尽管集群的 RM 和 NM 服务 运行 正常。
此外,用于启用 kerberose (https://docs.cask.co/cdap/current/en/admin-manual/installation/mapr.html#enabling-kerberos) 的 CDAP 安装文档指定使用特殊关键字 _HOST
,例如
<property>
<name>cdap.master.kerberos.keytab</name>
<value>/etc/security/keytabs/cdap.service.keytab</value>
</property>
<property>
<name>cdap.master.kerberos.principal</name>
<value><cdap-principal>/_HOST@EXAMPLE.COM</value>
</property>
其中 _HOST
不仅仅是一些文档占位符,而是一些应该自动填充的特殊关键字(例如,参见 https://mapr.com/docs/60/Hive/Config-HiveMetastoreForKerberos.html and https://mapr.com/docs/60/SecurityGuide/Config-YARN-Kerberos.html)。
显然,对于 MapR client 节点(即非控制节点或数据节点(节点只是 运行 MapR 客户端包与集群交互)) ,这是行不通的,必须明确给出 kerberos 原则服务器主机名(很确定文档存在,但目前找不到)。这是在进一步检查日志并看到 CDAP 服务尝试连接到 _HOST@us.org
而不是说 the.actual.domain@us.org
.
时发现的
在浏览了在 MapR 系统 (v6.0) 上安装 CDAP 并启动 cdap 服务的文档后,我发现一些 CDAP 服务在启动后 运行 (https://docs.cask.co/cdap/current/en/admin-manual/installation/mapr.html#starting-cdap-services) 尽管服务的启动循环没有显示任何错误。启动服务并检查其状态后的输出如下所示:
[root@mapr007 conf]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i start ; done
/usr/bin/id: cannot find name for group ID 504
Wed Nov 21 16:03:01 HST 2018 Starting CDAP Auth Server service on mapr007.org.local
/usr/bin/id: cannot find name for group ID 504
Wed Nov 21 16:03:04 HST 2018 Starting CDAP Kafka Server service on mapr007.org.local
/usr/bin/id: cannot find name for group ID 504
Wed Nov 21 16:03:07 HST 2018 Starting CDAP Master service on mapr007.org.local
Warning: Unable to determine $DRILL_HOME
Wed Nov 21 16:03:48 HST 2018 Ensuring required HBase coprocessors are on HDFS
Wed Nov 21 16:04:00 HST 2018 Running CDAP Master startup checks -- this may take a few minutes
/usr/bin/id: cannot find name for group ID 504
Wed Nov 21 16:04:15 HST 2018 Starting CDAP Router service on mapr007.org.local
/usr/bin/id: cannot find name for group ID 504
Wed Nov 21 16:04:17 HST 2018 Starting CDAP UI service on mapr007.org.local
[root@mapr007 conf]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i status ; done
/usr/bin/id: cannot find name for group ID 504
PID file /var/cdap/run/auth-server-cdap.pid exists, but process 12126 does not appear to be running
/usr/bin/id: cannot find name for group ID 504
CDAP Kafka Server running as PID 12653
/usr/bin/id: cannot find name for group ID 504
PID file /var/cdap/run/master-cdap.pid exists, but process 15789 does not appear to be running
/usr/bin/id: cannot find name for group ID 504
CDAP Router running as PID 16184
/usr/bin/id: cannot find name for group ID 504
CDAP UI running as PID 16308
请注意,虽然存在 "Unable to determine $DRILL_HOME" 错误,但我不认为这应该是一个大问题,因为在 cdap-[=49 中添加并设置了 explore.enabled
值=] 为假。
查看 cdap-site.xml,web UI 端口似乎设置为默认的 11011 但看不到它(如果只是检查 UI 是否会告诉我更多关于任何错误的信息)尽管它报告为 运行.
正在检查有关 PID 的一些信息,看到
# looking at the process that report to not be running
[root@mapr007 conf.dist]# ps -p 12126
PID TTY TIME CMD
[root@mapr007 conf.dist]# ps -p 15789
PID TTY TIME CMD
# looking at the rest of the processes
[root@mapr007 conf.dist]# ps -p 12653
PID TTY TIME CMD
12653 ? 00:08:12 java
[root@mapr007 conf.dist]# ps -p 16184
PID TTY TIME CMD
16184 ? 00:03:02 java
[root@mapr007 conf.dist]# ps -p 16308
PID TTY TIME CMD
16308 ? 00:00:01 node
还检查默认值 security.auth.server.bind.port
是否被其他服务
root@mapr007 conf.dist]# netstat -anp | grep 10009
但未检测到任何内容。
不确定从哪里开始调试,如有任何建议或信息,我们将不胜感激。
更新
正在重新启动服务以尝试获取更多日志记录数据,现在看到了一些错误(我想比只是不抱怨然后不工作要好)
[root@mapr007 conf.dist]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i stop ; done
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:29 HST 2018 Stopping CDAP Auth Server ...
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:29 HST 2018 Stopping CDAP Kafka Server ....
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:30 HST 2018 Stopping CDAP Master ...
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:31 HST 2018 Stopping CDAP Router ....
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:32 HST 2018 Stopping CDAP UI ....
[root@mapr007 conf.dist]# for i in `ls /etc/init.d/ | grep cdap` ; do sudo service $i start ; done
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:41 HST 2018 Starting CDAP Auth Server service on mapr007.org.local
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:44 HST 2018 Starting CDAP Kafka Server service on mapr007.org.local
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:06:47 HST 2018 Starting CDAP Master service on mapr007.org.local
Warning: Unable to determine $DRILL_HOME
Mon Nov 26 11:07:17 HST 2018 Ensuring required HBase coprocessors are on HDFS
Mon Nov 26 11:08:57 HST 2018 Running CDAP Master startup checks -- this may take a few minutes
[ERROR] Master startup checks failed. Please check /var/log/cdap/master-cdap-mapr007.org.local.log to address issues.
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:10:08 HST 2018 Starting CDAP Router service on mapr007.org.local
/usr/bin/id: cannot find name for group ID 504
Mon Nov 26 11:10:11 HST 2018 Starting CDAP UI service on mapr007.org.local
查看/var/log/cdap/master-cdap-mapr007.org.local.log文件的内容,最下面可以看到
...
...
...
2018-11-26 11:10:06,996 - ERROR [main:c.c.c.m.s.MasterStartupTool@109] - YarnCheck failed with RuntimeException: Unable to get status of YARN nodemanagers. Please check that YARN is running and that the correct Hadoop configuration (core-site.xml, yarn-site.xml) and libraries are included in the CDAP master classpath.
java.lang.RuntimeException: Unable to get status of YARN nodemanagers. Please check that YARN is running and that the correct Hadoop configuration (core-site.xml, yarn-site.xml) and libraries are included in the CDAP master classpath.
at co.cask.cdap.master.startup.YarnCheck.run(YarnCheck.java:79) ~[co.cask.cdap.cdap-master-5.1.0.jar:na]
at co.cask.cdap.common.startup.CheckRunner.runChecks(CheckRunner.java:51) ~[co.cask.cdap.cdap-common-5.1.0.jar:na]
at co.cask.cdap.master.startup.MasterStartupTool.canStartMaster(MasterStartupTool.java:106) [co.cask.cdap.cdap-master-5.1.0.jar:na]
at co.cask.cdap.master.startup.MasterStartupTool.main(MasterStartupTool.java:96) [co.cask.cdap.cdap-master-5.1.0.jar:na]
Caused by: java.util.concurrent.TimeoutException: null
at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[na:1.8.0_181]
at co.cask.cdap.master.startup.YarnCheck.run(YarnCheck.java:76) ~[co.cask.cdap.cdap-master-5.1.0.jar:na]
... 3 common frames omitted
2018-11-26 11:10:07,006 - ERROR [main:c.c.c.m.s.MasterStartupTool@113] - Root cause: TimeoutException:
2018-11-26 11:10:07,006 - ERROR [main:c.c.c.m.s.MasterStartupTool@116] - Errors detected while starting up master. Please check the logs, address all errors, then try again.
遵循“分布式 CDAP 上的 CDAP 服务由于异常而未启动。我应该怎么办?”文档中的常见问题解答似乎没有帮助(https://docs.cask.co/cdap/current/en/faqs/cdap.html#cdap-services-on-distributed-cdap-aren-t-starting-up-due-to-an-exception-what-should-i-do).
将继续调试,但希望对这些新错误提出任何意见。
重启集群上的资源管理器和节点管理器服务似乎解决了这个错误。这主要是根据另一个开发人员的猜测完成的,该猜测仅基于错误与 CDAP 无法连接到 YARN 有关的事实,尽管集群的 RM 和 NM 服务 运行 正常。
此外,用于启用 kerberose (https://docs.cask.co/cdap/current/en/admin-manual/installation/mapr.html#enabling-kerberos) 的 CDAP 安装文档指定使用特殊关键字 _HOST
,例如
<property>
<name>cdap.master.kerberos.keytab</name>
<value>/etc/security/keytabs/cdap.service.keytab</value>
</property>
<property>
<name>cdap.master.kerberos.principal</name>
<value><cdap-principal>/_HOST@EXAMPLE.COM</value>
</property>
其中 _HOST
不仅仅是一些文档占位符,而是一些应该自动填充的特殊关键字(例如,参见 https://mapr.com/docs/60/Hive/Config-HiveMetastoreForKerberos.html and https://mapr.com/docs/60/SecurityGuide/Config-YARN-Kerberos.html)。
显然,对于 MapR client 节点(即非控制节点或数据节点(节点只是 运行 MapR 客户端包与集群交互)) ,这是行不通的,必须明确给出 kerberos 原则服务器主机名(很确定文档存在,但目前找不到)。这是在进一步检查日志并看到 CDAP 服务尝试连接到 _HOST@us.org
而不是说 the.actual.domain@us.org
.