Ambari 无法 运行 自定义钩子来修改用户配置单元
Ambari unable to run custom hook for modifying user hive
正在尝试通过 Ambari (v2.7.3.0) (HDP 3.1.0.0-78) 将客户端节点添加到集群并看到奇怪的错误
stderr:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py", line 38, in <module>
BeforeAnyHook().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
method(env)
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py", line 31, in hook
setup_users()
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/shared_initialization.py", line 51, in setup_users
fetch_nonlocal_groups = params.fetch_nonlocal_groups,
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/accounts.py", line 90, in action_create
shell.checked_call(command, sudo=True)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']2019-11-25 13:07:58,000 - Reporting component version failed
Traceback (most recent call last):
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 363, in execute
self.save_component_version_to_structured_out(self.command_name)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 223, in save_component_version_to_structured_out
stack_select_package_name = stack_select.get_package_name()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 109, in get_package_name
package = get_packages(PACKAGE_SCOPE_STACK_SELECT, service_name, component_name)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 223, in get_packages
supported_packages = get_supported_packages()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages
raise Fail("Unable to query for supported packages using {0}".format(stack_selector_path))
Fail: Unable to query for supported packages using /usr/bin/hdp-select
stdout:
2019-11-25 13:07:57,644 - Stack Feature Version Info: Cluster Stack=3.1, Command Stack=None, Command Version=None -> 3.1
2019-11-25 13:07:57,651 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2019-11-25 13:07:57,652 - Group['livy'] {}
2019-11-25 13:07:57,654 - Group['spark'] {}
2019-11-25 13:07:57,654 - Group['ranger'] {}
2019-11-25 13:07:57,654 - Group['hdfs'] {}
2019-11-25 13:07:57,654 - Group['zeppelin'] {}
2019-11-25 13:07:57,655 - Group['hadoop'] {}
2019-11-25 13:07:57,655 - Group['users'] {}
2019-11-25 13:07:57,656 - User['yarn-ats'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
2019-11-25 13:07:57,658 - User['hive'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
2019-11-25 13:07:57,659 - Modifying user hive
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
2019-11-25 13:07:57,971 - The repository with version 3.1.0.0-78 for this command has been marked as resolved. It will be used to report the version of the component which was installed
2019-11-25 13:07:58,000 - Reporting component version failed
Traceback (most recent call last):
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 363, in execute
self.save_component_version_to_structured_out(self.command_name)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 223, in save_component_version_to_structured_out
stack_select_package_name = stack_select.get_package_name()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 109, in get_package_name
package = get_packages(PACKAGE_SCOPE_STACK_SELECT, service_name, component_name)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 223, in get_packages
supported_packages = get_supported_packages()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages
raise Fail("Unable to query for supported packages using {0}".format(stack_selector_path))
Fail: Unable to query for supported packages using /usr/bin/hdp-select
Command failed after 1 tries
问题似乎是
resource_management.core.exceptions.ExecutionFailed: Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd
由
引起
2019-11-25 13:07:57,659 - Modifying user hive
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
在将主机添加到集群之前手动添加 ambari-hdp-1.repo 和 yum-installing hdp-select
显示相同的错误消息,只是被截断了,这进一步加强了这一点此处显示的 stdout/err 部分。
当运行
[root@HW001 .ssh]# /usr/bin/hdp-select versions
3.1.0.0-78
从 ambari 服务器节点,我可以看到命令正在运行。
看看钩子脚本试图做什么 run/access,我明白了
[root@client001~]# ls -lha /var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py
-rw-r--r-- 1 root root 1.2K Nov 25 10:51 /var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py
[root@client001~]# ls -lha /var/lib/ambari-agent/data/command-632.json
-rw------- 1 root root 545K Nov 25 13:07 /var/lib/ambari-agent/data/command-632.json
[root@client001~]# ls -lha /var/lib/ambari-agent/cache/stack-hooks/before-ANY
total 0
drwxr-xr-x 4 root root 34 Nov 25 10:51 .
drwxr-xr-x 8 root root 147 Nov 25 10:51 ..
drwxr-xr-x 2 root root 34 Nov 25 10:51 files
drwxr-xr-x 2 root root 188 Nov 25 10:51 scripts
[root@client001~]# ls -lha /var/lib/ambari-agent/data/structured-out-632.json
ls: cannot access /var/lib/ambari-agent/data/structured-out-632.json: No such file or directory
[root@client001~]# ls -lha /var/lib/ambari-agent/tmp
total 96K
drwxrwxrwt 3 root root 4.0K Nov 25 13:06 .
drwxr-xr-x 10 root root 267 Nov 25 10:50 ..
drwxr-xr-x 6 root root 4.0K Nov 25 13:06 ambari_commons
-rwx------ 1 root root 1.4K Nov 25 13:06 ambari-sudo.sh
-rwxr-xr-x 1 root root 1.6K Nov 25 13:06 create-python-wrap.sh
-rwxr-xr-x 1 root root 1.6K Nov 25 10:50 os_check_type1574715018.py
-rwxr-xr-x 1 root root 1.6K Nov 25 11:12 os_check_type1574716360.py
-rwxr-xr-x 1 root root 1.6K Nov 25 11:29 os_check_type1574717391.py
-rwxr-xr-x 1 root root 1.6K Nov 25 13:06 os_check_type1574723161.py
-rwxr-xr-x 1 root root 16K Nov 25 10:50 setupAgent1574715020.py
-rwxr-xr-x 1 root root 16K Nov 25 11:12 setupAgent1574716361.py
-rwxr-xr-x 1 root root 16K Nov 25 11:29 setupAgent1574717392.py
-rwxr-xr-x 1 root root 16K Nov 25 13:06 setupAgent1574723163.py
注意有 ls: cannot access /var/lib/ambari-agent/data/structured-out-632.json: No such file or directory
。不过不确定这是否正常。
有人知道是什么导致了这个问题或任何调试提示吗?
更新 01:
在错误跟踪中有问题的最后一行附近添加一些日志打印行,即。 File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages
,我打印代码和标准输出:
2
ambari-python-wrap: can't open file '/usr/bin/hdp-select': [Errno 2] No such file or directory
那到底是什么?它希望 hdp-select
已经存在,但是如果我事先自己手动安装该二进制文件,ambari add-host UI 会抱怨。当我手动安装它时(使用与其余现有集群节点相同的 repo 文件),我看到的是...
0
Packages:
accumulo-client
accumulo-gc
accumulo-master
accumulo-monitor
accumulo-tablet
accumulo-tracer
atlas-client
atlas-server
beacon
beacon-client
beacon-server
druid-broker
druid-coordinator
druid-historical
druid-middlemanager
druid-overlord
druid-router
druid-superset
falcon-client
falcon-server
flume-server
hadoop-client
hadoop-hdfs-client
hadoop-hdfs-datanode
hadoop-hdfs-journalnode
hadoop-hdfs-namenode
hadoop-hdfs-nfs3
hadoop-hdfs-portmap
hadoop-hdfs-secondarynamenode
hadoop-hdfs-zkfc
hadoop-httpfs
hadoop-mapreduce-client
hadoop-mapreduce-historyserver
hadoop-yarn-client
hadoop-yarn-nodemanager
hadoop-yarn-registrydns
hadoop-yarn-resourcemanager
hadoop-yarn-timelinereader
hadoop-yarn-timelineserver
hbase-client
hbase-master
hbase-regionserver
hive-client
hive-metastore
hive-server2
hive-server2-hive
hive-server2-hive2
hive-webhcat
hive_warehouse_connector
kafka-broker
knox-server
livy-client
livy-server
livy2-client
livy2-server
mahout-client
oozie-client
oozie-server
phoenix-client
phoenix-server
pig-client
ranger-admin
ranger-kms
ranger-tagsync
ranger-usersync
shc
slider-client
spark-atlas-connector
spark-client
spark-historyserver
spark-schema-registry
spark-thriftserver
spark2-client
spark2-historyserver
spark2-thriftserver
spark_llap
sqoop-client
sqoop-server
storm-client
storm-nimbus
storm-slider-client
storm-supervisor
superset
tez-client
zeppelin-server
zookeeper-client
zookeeper-server
Aliases:
accumulo-server
all
client
hadoop-hdfs-server
hadoop-mapreduce-server
hadoop-yarn-server
hive-server
Command failed after 1 tries
更新 02:
从 File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 322
打印一些自定义日志记录(打印 err_msg
、code
、out
、err
的值),即
....
312 if throw_on_failure and not code in returns:
313 err_msg = Logger.filter_text("Execution of '{0}' returned {1}. {2}".format(command_alias, c ode, all_output))
314
315 #TODO remove
316 print("\n----------\nMY LOGS\n----------\n")
317 print(err_msg)
318 print(code)
319 print(out)
320 print(err)
321
322 raise ExecutionFailed(err_msg, code, out, err)
323
324 # if separate stderr is enabled (by default it's redirected to out)
325 if stderr == subprocess32.PIPE:
326 return code, out, err
327
328 return code, out
....
明白了
Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd
6
usermod: user 'hive' does not exist in /etc/passwd
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-816.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-816.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
2019-11-26 10:25:46,928 - The repository with version 3.1.0.0-78 for this command has been marked as resolved. It will be used to report the version of the component which was installed
因此似乎无法创建 hive
用户(尽管在此之前创建 yarn-ats
用户似乎没有问题)
在放弃并尝试自己手动创建配置单元用户之后,我明白了
[root@airflowetl ~]# useradd -g hadoop -s /bin/bash hive
useradd: user 'hive' already exists
[root@airflowetl ~]# cat /etc/passwd | grep hive
<nothing>
[root@airflowetl ~]# id hive
uid=379022825(hive) gid=379000513(domain users) groups=379000513(domain users)
这个现有用户的 uid 看起来像这样并且不在 /etc/passwd
文件中这一事实让我认为已经有一些现有的 Active Directory 用户(此客户端节点通过安装的 SSSD 与之同步)有名字蜂巢。检查我们的 AD 用户,事实证明这是真的。
暂时stopping the SSSD service停止与 AD 同步 (service sssd stop
)(因为,不确定您是否可以让服务器在重新运行客户端主机之前忽略基于单个用户的 AD 同步)添加在 Ambari 中为我解决了这个问题。
正在尝试通过 Ambari (v2.7.3.0) (HDP 3.1.0.0-78) 将客户端节点添加到集群并看到奇怪的错误
stderr:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py", line 38, in <module>
BeforeAnyHook().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
method(env)
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py", line 31, in hook
setup_users()
File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/shared_initialization.py", line 51, in setup_users
fetch_nonlocal_groups = params.fetch_nonlocal_groups,
File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/ambari-agent/lib/resource_management/core/providers/accounts.py", line 90, in action_create
shell.checked_call(command, sudo=True)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']2019-11-25 13:07:58,000 - Reporting component version failed
Traceback (most recent call last):
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 363, in execute
self.save_component_version_to_structured_out(self.command_name)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 223, in save_component_version_to_structured_out
stack_select_package_name = stack_select.get_package_name()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 109, in get_package_name
package = get_packages(PACKAGE_SCOPE_STACK_SELECT, service_name, component_name)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 223, in get_packages
supported_packages = get_supported_packages()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages
raise Fail("Unable to query for supported packages using {0}".format(stack_selector_path))
Fail: Unable to query for supported packages using /usr/bin/hdp-select
stdout:
2019-11-25 13:07:57,644 - Stack Feature Version Info: Cluster Stack=3.1, Command Stack=None, Command Version=None -> 3.1
2019-11-25 13:07:57,651 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2019-11-25 13:07:57,652 - Group['livy'] {}
2019-11-25 13:07:57,654 - Group['spark'] {}
2019-11-25 13:07:57,654 - Group['ranger'] {}
2019-11-25 13:07:57,654 - Group['hdfs'] {}
2019-11-25 13:07:57,654 - Group['zeppelin'] {}
2019-11-25 13:07:57,655 - Group['hadoop'] {}
2019-11-25 13:07:57,655 - Group['users'] {}
2019-11-25 13:07:57,656 - User['yarn-ats'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
2019-11-25 13:07:57,658 - User['hive'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
2019-11-25 13:07:57,659 - Modifying user hive
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
2019-11-25 13:07:57,971 - The repository with version 3.1.0.0-78 for this command has been marked as resolved. It will be used to report the version of the component which was installed
2019-11-25 13:07:58,000 - Reporting component version failed
Traceback (most recent call last):
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 363, in execute
self.save_component_version_to_structured_out(self.command_name)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 223, in save_component_version_to_structured_out
stack_select_package_name = stack_select.get_package_name()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 109, in get_package_name
package = get_packages(PACKAGE_SCOPE_STACK_SELECT, service_name, component_name)
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 223, in get_packages
supported_packages = get_supported_packages()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages
raise Fail("Unable to query for supported packages using {0}".format(stack_selector_path))
Fail: Unable to query for supported packages using /usr/bin/hdp-select
Command failed after 1 tries
问题似乎是
resource_management.core.exceptions.ExecutionFailed: Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd
由
引起2019-11-25 13:07:57,659 - Modifying user hive
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
在将主机添加到集群之前手动添加 ambari-hdp-1.repo 和 yum-installing hdp-select
显示相同的错误消息,只是被截断了,这进一步加强了这一点此处显示的 stdout/err 部分。
当运行
[root@HW001 .ssh]# /usr/bin/hdp-select versions
3.1.0.0-78
从 ambari 服务器节点,我可以看到命令正在运行。
看看钩子脚本试图做什么 run/access,我明白了
[root@client001~]# ls -lha /var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py
-rw-r--r-- 1 root root 1.2K Nov 25 10:51 /var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py
[root@client001~]# ls -lha /var/lib/ambari-agent/data/command-632.json
-rw------- 1 root root 545K Nov 25 13:07 /var/lib/ambari-agent/data/command-632.json
[root@client001~]# ls -lha /var/lib/ambari-agent/cache/stack-hooks/before-ANY
total 0
drwxr-xr-x 4 root root 34 Nov 25 10:51 .
drwxr-xr-x 8 root root 147 Nov 25 10:51 ..
drwxr-xr-x 2 root root 34 Nov 25 10:51 files
drwxr-xr-x 2 root root 188 Nov 25 10:51 scripts
[root@client001~]# ls -lha /var/lib/ambari-agent/data/structured-out-632.json
ls: cannot access /var/lib/ambari-agent/data/structured-out-632.json: No such file or directory
[root@client001~]# ls -lha /var/lib/ambari-agent/tmp
total 96K
drwxrwxrwt 3 root root 4.0K Nov 25 13:06 .
drwxr-xr-x 10 root root 267 Nov 25 10:50 ..
drwxr-xr-x 6 root root 4.0K Nov 25 13:06 ambari_commons
-rwx------ 1 root root 1.4K Nov 25 13:06 ambari-sudo.sh
-rwxr-xr-x 1 root root 1.6K Nov 25 13:06 create-python-wrap.sh
-rwxr-xr-x 1 root root 1.6K Nov 25 10:50 os_check_type1574715018.py
-rwxr-xr-x 1 root root 1.6K Nov 25 11:12 os_check_type1574716360.py
-rwxr-xr-x 1 root root 1.6K Nov 25 11:29 os_check_type1574717391.py
-rwxr-xr-x 1 root root 1.6K Nov 25 13:06 os_check_type1574723161.py
-rwxr-xr-x 1 root root 16K Nov 25 10:50 setupAgent1574715020.py
-rwxr-xr-x 1 root root 16K Nov 25 11:12 setupAgent1574716361.py
-rwxr-xr-x 1 root root 16K Nov 25 11:29 setupAgent1574717392.py
-rwxr-xr-x 1 root root 16K Nov 25 13:06 setupAgent1574723163.py
注意有 ls: cannot access /var/lib/ambari-agent/data/structured-out-632.json: No such file or directory
。不过不确定这是否正常。
有人知道是什么导致了这个问题或任何调试提示吗?
更新 01:
在错误跟踪中有问题的最后一行附近添加一些日志打印行,即。 File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages
,我打印代码和标准输出:
2
ambari-python-wrap: can't open file '/usr/bin/hdp-select': [Errno 2] No such file or directory
那到底是什么?它希望 hdp-select
已经存在,但是如果我事先自己手动安装该二进制文件,ambari add-host UI 会抱怨。当我手动安装它时(使用与其余现有集群节点相同的 repo 文件),我看到的是...
0
Packages:
accumulo-client
accumulo-gc
accumulo-master
accumulo-monitor
accumulo-tablet
accumulo-tracer
atlas-client
atlas-server
beacon
beacon-client
beacon-server
druid-broker
druid-coordinator
druid-historical
druid-middlemanager
druid-overlord
druid-router
druid-superset
falcon-client
falcon-server
flume-server
hadoop-client
hadoop-hdfs-client
hadoop-hdfs-datanode
hadoop-hdfs-journalnode
hadoop-hdfs-namenode
hadoop-hdfs-nfs3
hadoop-hdfs-portmap
hadoop-hdfs-secondarynamenode
hadoop-hdfs-zkfc
hadoop-httpfs
hadoop-mapreduce-client
hadoop-mapreduce-historyserver
hadoop-yarn-client
hadoop-yarn-nodemanager
hadoop-yarn-registrydns
hadoop-yarn-resourcemanager
hadoop-yarn-timelinereader
hadoop-yarn-timelineserver
hbase-client
hbase-master
hbase-regionserver
hive-client
hive-metastore
hive-server2
hive-server2-hive
hive-server2-hive2
hive-webhcat
hive_warehouse_connector
kafka-broker
knox-server
livy-client
livy-server
livy2-client
livy2-server
mahout-client
oozie-client
oozie-server
phoenix-client
phoenix-server
pig-client
ranger-admin
ranger-kms
ranger-tagsync
ranger-usersync
shc
slider-client
spark-atlas-connector
spark-client
spark-historyserver
spark-schema-registry
spark-thriftserver
spark2-client
spark2-historyserver
spark2-thriftserver
spark_llap
sqoop-client
sqoop-server
storm-client
storm-nimbus
storm-slider-client
storm-supervisor
superset
tez-client
zeppelin-server
zookeeper-client
zookeeper-server
Aliases:
accumulo-server
all
client
hadoop-hdfs-server
hadoop-mapreduce-server
hadoop-yarn-server
hive-server
Command failed after 1 tries
更新 02:
从 File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 322
打印一些自定义日志记录(打印 err_msg
、code
、out
、err
的值),即
....
312 if throw_on_failure and not code in returns:
313 err_msg = Logger.filter_text("Execution of '{0}' returned {1}. {2}".format(command_alias, c ode, all_output))
314
315 #TODO remove
316 print("\n----------\nMY LOGS\n----------\n")
317 print(err_msg)
318 print(code)
319 print(out)
320 print(err)
321
322 raise ExecutionFailed(err_msg, code, out, err)
323
324 # if separate stderr is enabled (by default it's redirected to out)
325 if stderr == subprocess32.PIPE:
326 return code, out, err
327
328 return code, out
....
明白了
Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd
6
usermod: user 'hive' does not exist in /etc/passwd
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-816.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-816.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
2019-11-26 10:25:46,928 - The repository with version 3.1.0.0-78 for this command has been marked as resolved. It will be used to report the version of the component which was installed
因此似乎无法创建 hive
用户(尽管在此之前创建 yarn-ats
用户似乎没有问题)
在放弃并尝试自己手动创建配置单元用户之后,我明白了
[root@airflowetl ~]# useradd -g hadoop -s /bin/bash hive
useradd: user 'hive' already exists
[root@airflowetl ~]# cat /etc/passwd | grep hive
<nothing>
[root@airflowetl ~]# id hive
uid=379022825(hive) gid=379000513(domain users) groups=379000513(domain users)
这个现有用户的 uid 看起来像这样并且不在 /etc/passwd
文件中这一事实让我认为已经有一些现有的 Active Directory 用户(此客户端节点通过安装的 SSSD 与之同步)有名字蜂巢。检查我们的 AD 用户,事实证明这是真的。
暂时stopping the SSSD service停止与 AD 同步 (service sssd stop
)(因为,不确定您是否可以让服务器在重新运行客户端主机之前忽略基于单个用户的 AD 同步)添加在 Ambari 中为我解决了这个问题。