Openshift_control_plane : 报告控制平面错误

Question

我正在尝试使用 Ansible 安装 Openshift Origin。我在执行 deploy_cluster.yml 时遇到问题，错误是：

TASK [openshift_control_plane : Report control plane errors] ***********************************************************************************************************
fatal: [masterserver.srv.com]: FAILED! => {"changed": false, "msg": "Control plane pods didn't come up"}

NO MORE HOSTS LEFT *****************************************************************************************************************************************************

PLAY RECAP *************************************************************************************************************************************************************
localhost                  : ok=11   changed=0    unreachable=0    failed=0    skipped=5    rescued=0    ignored=0
masterserver.srv.com       : ok=295  changed=44   unreachable=0    failed=1    skipped=233  rescued=0    ignored=4
nodeserver.srv.com         : ok=103  changed=16   unreachable=0    failed=0    skipped=88   rescued=0    ignored=0


INSTALLER STATUS *******************************************************************************************************************************************************
Initialization              : Complete (0:02:49)
Health Check                : Complete (0:00:36)
Node Bootstrap Preparation  : Complete (0:09:55)
etcd Install                : Complete (0:02:05)
Master Install              : In Progress (0:42:42)
        This phase can be restarted by running: playbooks/openshift-master/config.yml


Failure summary:


  1. Hosts:    masterserver.srv.com
     Play:     Configure masters
     Task:     Report control plane errors
     Message:  Control plane pods didn't come up

关于我的环境的描述：

Ansible 版本

[root@masterserver ~]# ansible --version
ansible 2.9.2`enter code here`
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Apr 11 2018, 07:36:10) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]

OS版本（主-客户端相同版本）

[root@masterserver ~]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Docker版本

[root@masterserver ~]# docker version
Client:
 Version:         1.13.1
 API version:     1.26
 Package version: docker-1.13.1-103.git7f2769b.el7.centos.x86_64
 Go version:      go1.10.3
 Git commit:      7f2769b/1.13.1
 Built:           Sun Sep 15 14:06:47 2019
 OS/Arch:         linux/amd64

Server:
 Version:         1.13.1
 API version:     1.26 (minimum version 1.12)
 Package version: docker-1.13.1-103.git7f2769b.el7.centos.x86_64
 Go version:      go1.10.3
 Git commit:      7f2769b/1.13.1
 Built:           Sun Sep 15 14:06:47 2019
 OS/Arch:         linux/amd64
 Experimental:    false

我执行的步骤：

ansible-playbook openshift-ansible/playbooks/prerequisites.yml（成功）
ansible-剧本openshift-ansible/playbooks/deploy_cluster.yml

额外的：

[root@masterserver ~]# cat /etc/ansible/hosts
[OSEv3:children]
masters
nodes
etcd

[OSEv3:vars]
ansible_ssh_user=origin
ansible_become=true
openshift_deployment_type=origin
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]
openshift_master_default_subdomain=apps-masterserver.srv.com
openshift_disable_check=disk_availability,docker_storage,memory_availability,docker_image_availability
openshift_master_api_port=8443
openshift_master_console_port=8443
osm_etcd_image=registry.access.redhat.com/rhel7/etcd:3.2.22

[masters]
masterserver.srv.com

[etcd]
masterserver.srv.com

[nodes]
masterserver.srv.com openshift_node_group_name='node-config-master-infra'
nodeserver.srv.com openshift_node_group_name='node-config-compute'
```

[root@masterserver ~]# hostname
masterserver.srv.com

[root@masterserver ~]# oc get nodes
The connection to the server masterserver.srv.com:8443 was refused - did you specify the right host or port?

[root@masterserver ~]# netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:8444            0.0.0.0:*               LISTEN      1700/openshift
tcp        0      0 127.0.0.1:44642         0.0.0.0:*               LISTEN      1407/hyperkube
tcp        0      0 192.168.43.50:2379      0.0.0.0:*               LISTEN      1647/etcd
tcp        0      0 192.168.43.50:2380      0.0.0.0:*               LISTEN      1647/etcd
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      1/systemd
tcp        0      0 172.17.0.1:53           0.0.0.0:*               LISTEN      1024/dnsmasq
tcp        0      0 192.168.43.50:53        0.0.0.0:*               LISTEN      1024/dnsmasq
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1029/sshd
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      1166/master
tcp6       0      0 :::10250                :::*                    LISTEN      1407/hyperkube
tcp6       0      0 :::111                  :::*                    LISTEN      1/systemd
tcp6       0      0 fe80::a00:27ff:fee8::53 :::*                    LISTEN      1024/dnsmasq
tcp6       0      0 :::22                   :::*                    LISTEN      1029/sshd
tcp6       0      0 ::1:25                  :::*                    LISTEN      1166/master

[root@masterserver ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.43.51   nodeserver.srv.com
192.168.43.50   masterserver.srv.com

控制平面 pods 没有出现，所以安装卡住了。这是 ansible 为运行时的错误过程之一：

[WARNING]: Module invocation had junk after the JSON data: Error in atexit._run_exitfuncs: Traceback (most recent call last):   File "/usr/lib64/python2.7/atexit.py",
line 24, in _run_exitfuncs     func(*targs, **kargs)   File "/tmp/ansible_oc_obj_payload_h6RqDy/ansible_oc_obj_payload.zip/ansible/modules/oc_obj.py", line 1257, in
cleanup AttributeError: 'NoneType' object has no attribute 'path' Error in sys.exitfunc: Traceback (most recent call last):   File "/usr/lib64/python2.7/atexit.py",
line 24, in _run_exitfuncs     func(*targs, **kargs)   File "/tmp/ansible_oc_obj_payload_h6RqDy/ansible_oc_obj_payload.zip/ansible/modules/oc_obj.py", line 1257, in
cleanup AttributeError: 'NoneType' object has no attribute 'path'

谁能帮我解决这个问题？谢谢。

Answer 1

已解决！将我的环境提升到更高的规格。我看到一些日志显示我之前使用的资源 1vcpu 和 RAM 2GB (Master + Infra1, Compute 1) => Recording NodeHasSufficientResources in /var/log/messages.

目前，我使用 2vcpu 和 RAM 8GB（Master + Infra 1，Compute 1）并且运行良好！

Openshift_control_plane : 报告控制平面错误

Openshift_control_plane : Report control plane errors

openshift

ansible

openshift-origin