Pacemaker 和 Corosync 的 AWS 区域问题
AWS Region Problem with Pacemaker and Corosync
我目前正在尝试使用 3 个 EC2 实例在 AWS 上实施 HA 故障转移。假设这 3 台机器的名称是 HA1、HA2 和 HA3。 HA1 有弹性 IP,另外两个有标准 public IP 来建立 SSH 连接。我已经在下面的列表中关注了这三个资源:
- https://medium.com/@2infiniti/creating-highly-available-nodes-on-icon-stage-1-active-passive-failover-with-pacemaker-and-a9d56b1484da
- https://medium.com/@gt.anand1994/ha-cluster-with-elasticip-using-corosync-and-pacemaker-a013d288ae8
- https://www.howtoforge.com/tutorial/how-to-set-up-nginx-high-availability-with-pacemaker-corosync-and-crmsh-on-ubuntu-1604/#step-configure-corosync
在我执行 crm status
之前完全没有问题,因为我可以在 shell 上看到以下输出:
Current DC: PRep-01 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Mon Dec 16 15:01:40 2019
Last change: Mon Dec 16 15:01:31 2019 by root via cibadmin on PRep-01
3 nodes configured
1 resource configured
Online: [ PRep-01 PRep-02 PRep-03 ]
Full list of resources:
deneme123 (ocf::heartbeat:awseip): Stopped
如您所见,主要问题是我使用以下命令创建的资源无法启动。
sudo crm configure primitive deneme123 ocf:heartbeat:awseip params elastic_ip="xx.xx.xx.xx" awscli="$(which aws)" allocation_id="eipalloc-xxxxxxxxxx" op start timeout="60s" interval="0s" on-fail="restart" op monitor timeout="60s" interval="10s" on-fail="restart" op stop timeout="60s" interval="0s" on-fail="block" meta migration-threshold="2" failure-timeout="60s" resource-stickiness="100"
最后,当我在所有三个实例上检查起搏器的状态时,我得到以下信息:
Dec 16 15:01:32 ip-172-31-47-76 crmd[30721]: notice: Result of probe operation for deneme123 on PRep-02: 7 (not ru
Dec 16 15:01:32 ip-172-31-47-76 crmd[30721]: notice: PRep-02-deneme123_monitor_0:5 [ You must specify a region. Yo
Dec 16 15:01:37 ip-172-31-47-76 lrmd[30714]: notice: deneme123_start_0:30780:stderr [ You must specify a region. Y
Dec 16 15:01:37 ip-172-31-47-76 lrmd[30714]: notice: deneme123_start_0:30780:stderr [ You must specify a region. Y
Dec 16 15:01:37 ip-172-31-47-76 lrmd[30714]: notice: deneme123_start_0:30780:stderr [ You must specify a region. Y
Dec 16 15:01:37 ip-172-31-47-76 crmd[30721]: notice: Result of start operation for deneme123 on PRep-02: 7 (not ru
Dec 16 15:01:37 ip-172-31-47-76 crmd[30721]: notice: PRep-02-deneme123_start_0:6 [ You must specify a region. You
Dec 16 15:01:38 ip-172-31-47-76 lrmd[30714]: notice: deneme123_stop_0:30807:stderr [ You must specify a region. Yo
Dec 16 15:01:38 ip-172-31-47-76 lrmd[30714]: notice: deneme123_stop_0:30807:stderr [ You must specify a region. Yo
Dec 16 15:01:38 ip-172-31-47-76 crmd[30721]: notice: Result of stop operation for deneme123 on PRep-02: 0 (ok)
但是我已经aws configure
进入了区域并且在~/.aws/config上也能看到区域。同时,我还添加了 AWS_DEFAULT_REGION=eu-xx-1
到 /etc/systemd/system/multi-user.target.wants/pacemaker.service
文件。
问题是这里的问题是什么?我看不出 AWS 区域有什么问题。是什么原因造成的?
您必须正确配置安全组和 ACL 规则。
实例之间是否ping通?
问题似乎与 IAM 角色及其策略有关。一旦我创建了一个具有所需策略的角色,我就能够成功地使用 EIP 部署我的 HA 解决方案。
我目前正在尝试使用 3 个 EC2 实例在 AWS 上实施 HA 故障转移。假设这 3 台机器的名称是 HA1、HA2 和 HA3。 HA1 有弹性 IP,另外两个有标准 public IP 来建立 SSH 连接。我已经在下面的列表中关注了这三个资源:
- https://medium.com/@2infiniti/creating-highly-available-nodes-on-icon-stage-1-active-passive-failover-with-pacemaker-and-a9d56b1484da
- https://medium.com/@gt.anand1994/ha-cluster-with-elasticip-using-corosync-and-pacemaker-a013d288ae8
- https://www.howtoforge.com/tutorial/how-to-set-up-nginx-high-availability-with-pacemaker-corosync-and-crmsh-on-ubuntu-1604/#step-configure-corosync
在我执行 crm status
之前完全没有问题,因为我可以在 shell 上看到以下输出:
Current DC: PRep-01 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Mon Dec 16 15:01:40 2019
Last change: Mon Dec 16 15:01:31 2019 by root via cibadmin on PRep-01
3 nodes configured
1 resource configured
Online: [ PRep-01 PRep-02 PRep-03 ]
Full list of resources:
deneme123 (ocf::heartbeat:awseip): Stopped
如您所见,主要问题是我使用以下命令创建的资源无法启动。
sudo crm configure primitive deneme123 ocf:heartbeat:awseip params elastic_ip="xx.xx.xx.xx" awscli="$(which aws)" allocation_id="eipalloc-xxxxxxxxxx" op start timeout="60s" interval="0s" on-fail="restart" op monitor timeout="60s" interval="10s" on-fail="restart" op stop timeout="60s" interval="0s" on-fail="block" meta migration-threshold="2" failure-timeout="60s" resource-stickiness="100"
最后,当我在所有三个实例上检查起搏器的状态时,我得到以下信息:
Dec 16 15:01:32 ip-172-31-47-76 crmd[30721]: notice: Result of probe operation for deneme123 on PRep-02: 7 (not ru
Dec 16 15:01:32 ip-172-31-47-76 crmd[30721]: notice: PRep-02-deneme123_monitor_0:5 [ You must specify a region. Yo
Dec 16 15:01:37 ip-172-31-47-76 lrmd[30714]: notice: deneme123_start_0:30780:stderr [ You must specify a region. Y
Dec 16 15:01:37 ip-172-31-47-76 lrmd[30714]: notice: deneme123_start_0:30780:stderr [ You must specify a region. Y
Dec 16 15:01:37 ip-172-31-47-76 lrmd[30714]: notice: deneme123_start_0:30780:stderr [ You must specify a region. Y
Dec 16 15:01:37 ip-172-31-47-76 crmd[30721]: notice: Result of start operation for deneme123 on PRep-02: 7 (not ru
Dec 16 15:01:37 ip-172-31-47-76 crmd[30721]: notice: PRep-02-deneme123_start_0:6 [ You must specify a region. You
Dec 16 15:01:38 ip-172-31-47-76 lrmd[30714]: notice: deneme123_stop_0:30807:stderr [ You must specify a region. Yo
Dec 16 15:01:38 ip-172-31-47-76 lrmd[30714]: notice: deneme123_stop_0:30807:stderr [ You must specify a region. Yo
Dec 16 15:01:38 ip-172-31-47-76 crmd[30721]: notice: Result of stop operation for deneme123 on PRep-02: 0 (ok)
但是我已经aws configure
进入了区域并且在~/.aws/config上也能看到区域。同时,我还添加了 AWS_DEFAULT_REGION=eu-xx-1
到 /etc/systemd/system/multi-user.target.wants/pacemaker.service
文件。
问题是这里的问题是什么?我看不出 AWS 区域有什么问题。是什么原因造成的?
您必须正确配置安全组和 ACL 规则。
实例之间是否ping通?
问题似乎与 IAM 角色及其策略有关。一旦我创建了一个具有所需策略的角色,我就能够成功地使用 EIP 部署我的 HA 解决方案。