Ambari 无法 运行 自定义钩子来修改用户配置单元

Ambari unable to run custom hook for modifying user hive

正在尝试通过 Ambari (v2.7.3.0) (HDP 3.1.0.0-78) 将客户端节点添加到集群并看到奇怪的错误

stderr: 
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py", line 38, in <module>
    BeforeAnyHook().execute()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 352, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py", line 31, in hook
    setup_users()
  File "/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/shared_initialization.py", line 51, in setup_users
    fetch_nonlocal_groups = params.fetch_nonlocal_groups,
  File "/usr/lib/ambari-agent/lib/resource_management/core/base.py", line 166, in __init__
    self.env.run()
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/ambari-agent/lib/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/ambari-agent/lib/resource_management/core/providers/accounts.py", line 90, in action_create
    shell.checked_call(command, sudo=True)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy, returns=returns)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 314, in _call
    raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']2019-11-25 13:07:58,000 - Reporting component version failed
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 363, in execute
    self.save_component_version_to_structured_out(self.command_name)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 223, in save_component_version_to_structured_out
    stack_select_package_name = stack_select.get_package_name()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 109, in get_package_name
    package = get_packages(PACKAGE_SCOPE_STACK_SELECT, service_name, component_name)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 223, in get_packages
    supported_packages = get_supported_packages()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages
    raise Fail("Unable to query for supported packages using {0}".format(stack_selector_path))
Fail: Unable to query for supported packages using /usr/bin/hdp-select



 stdout:
2019-11-25 13:07:57,644 - Stack Feature Version Info: Cluster Stack=3.1, Command Stack=None, Command Version=None -> 3.1
2019-11-25 13:07:57,651 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2019-11-25 13:07:57,652 - Group['livy'] {}
2019-11-25 13:07:57,654 - Group['spark'] {}
2019-11-25 13:07:57,654 - Group['ranger'] {}
2019-11-25 13:07:57,654 - Group['hdfs'] {}
2019-11-25 13:07:57,654 - Group['zeppelin'] {}
2019-11-25 13:07:57,655 - Group['hadoop'] {}
2019-11-25 13:07:57,655 - Group['users'] {}
2019-11-25 13:07:57,656 - User['yarn-ats'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
2019-11-25 13:07:57,658 - User['hive'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop'], 'uid': None}
2019-11-25 13:07:57,659 - Modifying user hive
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
2019-11-25 13:07:57,971 - The repository with version 3.1.0.0-78 for this command has been marked as resolved. It will be used to report the version of the component which was installed
2019-11-25 13:07:58,000 - Reporting component version failed
Traceback (most recent call last):
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 363, in execute
    self.save_component_version_to_structured_out(self.command_name)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 223, in save_component_version_to_structured_out
    stack_select_package_name = stack_select.get_package_name()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 109, in get_package_name
    package = get_packages(PACKAGE_SCOPE_STACK_SELECT, service_name, component_name)
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 223, in get_packages
    supported_packages = get_supported_packages()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages
    raise Fail("Unable to query for supported packages using {0}".format(stack_selector_path))
Fail: Unable to query for supported packages using /usr/bin/hdp-select

Command failed after 1 tries

问题似乎是

resource_management.core.exceptions.ExecutionFailed: Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd

引起
2019-11-25 13:07:57,659 - Modifying user hive
Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-632.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-632.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']

在将主机添加到集群之前手动添加 ambari-hdp-1.repo 和 yum-installing hdp-select 显示相同的错误消息,只是被截断了,这进一步加强了这一点此处显示的 stdout/err 部分。

当运行

[root@HW001 .ssh]# /usr/bin/hdp-select versions
3.1.0.0-78

从 ambari 服务器节点,我可以看到命令正在运行。

看看钩子脚本试图做什么 run/access,我明白了

[root@client001~]# ls -lha /var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py
-rw-r--r-- 1 root root 1.2K Nov 25 10:51 /var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py
[root@client001~]# ls -lha /var/lib/ambari-agent/data/command-632.json
-rw------- 1 root root 545K Nov 25 13:07 /var/lib/ambari-agent/data/command-632.json
[root@client001~]# ls -lha /var/lib/ambari-agent/cache/stack-hooks/before-ANY
total 0
drwxr-xr-x 4 root root  34 Nov 25 10:51 .
drwxr-xr-x 8 root root 147 Nov 25 10:51 ..
drwxr-xr-x 2 root root  34 Nov 25 10:51 files
drwxr-xr-x 2 root root 188 Nov 25 10:51 scripts
[root@client001~]# ls -lha /var/lib/ambari-agent/data/structured-out-632.json
ls: cannot access /var/lib/ambari-agent/data/structured-out-632.json: No such file or directory
[root@client001~]# ls -lha /var/lib/ambari-agent/tmp
total 96K
drwxrwxrwt  3 root root 4.0K Nov 25 13:06 .
drwxr-xr-x 10 root root  267 Nov 25 10:50 ..
drwxr-xr-x  6 root root 4.0K Nov 25 13:06 ambari_commons
-rwx------  1 root root 1.4K Nov 25 13:06 ambari-sudo.sh
-rwxr-xr-x  1 root root 1.6K Nov 25 13:06 create-python-wrap.sh
-rwxr-xr-x  1 root root 1.6K Nov 25 10:50 os_check_type1574715018.py
-rwxr-xr-x  1 root root 1.6K Nov 25 11:12 os_check_type1574716360.py
-rwxr-xr-x  1 root root 1.6K Nov 25 11:29 os_check_type1574717391.py
-rwxr-xr-x  1 root root 1.6K Nov 25 13:06 os_check_type1574723161.py
-rwxr-xr-x  1 root root  16K Nov 25 10:50 setupAgent1574715020.py
-rwxr-xr-x  1 root root  16K Nov 25 11:12 setupAgent1574716361.py
-rwxr-xr-x  1 root root  16K Nov 25 11:29 setupAgent1574717392.py
-rwxr-xr-x  1 root root  16K Nov 25 13:06 setupAgent1574723163.py

注意有 ls: cannot access /var/lib/ambari-agent/data/structured-out-632.json: No such file or directory。不过不确定这是否正常。

有人知道是什么导致了这个问题或任何调试提示吗?


更新 01: 在错误跟踪中有问题的最后一行附近添加一些日志打印行,即。 File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/stack_select.py", line 147, in get_supported_packages,我打印代码和标准输出:

2
ambari-python-wrap: can't open file '/usr/bin/hdp-select': [Errno 2] No such file or directory

那到底是什么?它希望 hdp-select 已经存在,但是如果我事先自己手动安装该二进制文件,ambari add-host UI 会抱怨。当我手动安装它时(使用与其余现有集群节点相同的 repo 文件),我看到的是...

0
Packages:
  accumulo-client
  accumulo-gc
  accumulo-master
  accumulo-monitor
  accumulo-tablet
  accumulo-tracer
  atlas-client
  atlas-server
  beacon
  beacon-client
  beacon-server
  druid-broker
  druid-coordinator
  druid-historical
  druid-middlemanager
  druid-overlord
  druid-router
  druid-superset
  falcon-client
  falcon-server
  flume-server
  hadoop-client
  hadoop-hdfs-client
  hadoop-hdfs-datanode
  hadoop-hdfs-journalnode
  hadoop-hdfs-namenode
  hadoop-hdfs-nfs3
  hadoop-hdfs-portmap
  hadoop-hdfs-secondarynamenode
  hadoop-hdfs-zkfc
  hadoop-httpfs
  hadoop-mapreduce-client
  hadoop-mapreduce-historyserver
  hadoop-yarn-client
  hadoop-yarn-nodemanager
  hadoop-yarn-registrydns
  hadoop-yarn-resourcemanager
  hadoop-yarn-timelinereader
  hadoop-yarn-timelineserver
  hbase-client
  hbase-master
  hbase-regionserver
  hive-client
  hive-metastore
  hive-server2
  hive-server2-hive
  hive-server2-hive2
  hive-webhcat
  hive_warehouse_connector
  kafka-broker
  knox-server
  livy-client
  livy-server
  livy2-client
  livy2-server
  mahout-client
  oozie-client
  oozie-server
  phoenix-client
  phoenix-server
  pig-client
  ranger-admin
  ranger-kms
  ranger-tagsync
  ranger-usersync
  shc
  slider-client
  spark-atlas-connector
  spark-client
  spark-historyserver
  spark-schema-registry
  spark-thriftserver
  spark2-client
  spark2-historyserver
  spark2-thriftserver
  spark_llap
  sqoop-client
  sqoop-server
  storm-client
  storm-nimbus
  storm-slider-client
  storm-supervisor
  superset
  tez-client
  zeppelin-server
  zookeeper-client
  zookeeper-server
Aliases:
  accumulo-server
  all
  client
  hadoop-hdfs-server
  hadoop-mapreduce-server
  hadoop-yarn-server
  hive-server

Command failed after 1 tries

更新 02: 从 File "/usr/lib/ambari-agent/lib/resource_management/core/shell.py", line 322 打印一些自定义日志记录(打印 err_msgcodeouterr 的值),即

....
    312   if throw_on_failure and not code in returns:
    313     err_msg = Logger.filter_text("Execution of '{0}' returned {1}. {2}".format(command_alias, c        ode, all_output))
    314
    315     #TODO remove
    316     print("\n----------\nMY LOGS\n----------\n")
    317     print(err_msg)
    318     print(code)
    319     print(out)
    320     print(err)
    321
    322     raise ExecutionFailed(err_msg, code, out, err)
    323
    324   # if separate stderr is enabled (by default it's redirected to out)
    325   if stderr == subprocess32.PIPE:
    326     return code, out, err
    327
    328   return code, out
....

明白了

Execution of 'usermod -G hadoop -g hadoop hive' returned 6. usermod: user 'hive' does not exist in /etc/passwd
6
usermod: user 'hive' does not exist in /etc/passwd

Error: Error: Unable to run the custom hook script ['/usr/bin/python', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY/scripts/hook.py', 'ANY', '/var/lib/ambari-agent/data/command-816.json', '/var/lib/ambari-agent/cache/stack-hooks/before-ANY', '/var/lib/ambari-agent/data/structured-out-816.json', 'INFO', '/var/lib/ambari-agent/tmp', 'PROTOCOL_TLSv1_2', '']
2019-11-26 10:25:46,928 - The repository with version 3.1.0.0-78 for this command has been marked as resolved. It will be used to report the version of the component which was installed

因此似乎无法创建 hive 用户(尽管在此之前创建 yarn-ats 用户似乎没有问题)

在放弃并尝试自己手动创建配置单元用户之后,我明白了

[root@airflowetl ~]# useradd -g hadoop -s /bin/bash hive
useradd: user 'hive' already exists
[root@airflowetl ~]# cat /etc/passwd | grep hive
<nothing>
[root@airflowetl ~]# id hive
uid=379022825(hive) gid=379000513(domain users) groups=379000513(domain users)

这个现有用户的 uid 看起来像这样并且不在 /etc/passwd 文件中这一事实让我认为已经有一些现有的 Active Directory 用户(此客户端节点通过安装的 SSSD 与之同步)有名字蜂巢。检查我们的 AD 用户,事实证明这是真的。

暂时stopping the SSSD service停止与 AD 同步 (service sssd stop)(因为,不确定您是否可以让服务器在重新运行客户端主机之前忽略基于单个用户的 AD 同步)添加在 Ambari 中为我解决了这个问题。