启动后删除了 AWS Cloudwatch 代理配置文件
AWS Cloudwatch agent config file removed after startup
问题
我只是想在启动时使用 AWS 用户数据在 Amazon Linux 2 个实例上安装 Cloudwatch 代理。出于某种原因,在 Cloud Init 完成 运行ning 后,所有服务都重新启动,我放在 cloudwatch 文件夹中的配置文件不再存在。
我使用的是使用 Packer 预构建的自定义 AMI,我的配置文件从 Ansible 模板放入 /opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json
。这是我要使用的配置文件,其中包含我要发送的所有指标和日志。然后我在安装代理后在启动时将其复制到 /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
。
这是我的用户数据脚本:
#!/bin/bash
yum install amazon-cloudwatch-agent -y
cp /opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
发生了什么
启动完成后,我可以正确看到脚本运行。如果我 运行 cat /opt/aws/amazon-cloudwatch-agent/log/amazon-cloudwatch-agent.log
我可以看到以下内容:
2021/07/16 13:33:46 I! I! Detected the instance is EC2
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
Valid Json input schema.
I! Detecting run_as_user...
No csm configuration found.
Configuration validation first phase succeeded
2021/07/16 13:33:46 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
2021/07/16 13:33:46 I! Detected runAsUser: root
2021/07/16 13:33:46 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to root:root
2021-07-16T13:33:46Z I! Starting AmazonCloudWatchAgent 1.247347.4
2021-07-16T13:33:46Z I! Loaded inputs: netstat diskio logfile mem net processes swap cpu disk
2021-07-16T13:33:46Z I! Loaded aggregators:
2021-07-16T13:33:46Z I! Loaded processors: delta ec2tagger
2021-07-16T13:33:46Z I! Loaded outputs: cloudwatch cloudwatchlogs
2021-07-16T13:33:46Z I! Tags enabled: host=ip-XX-XX-X-XXX.eu-west-1.compute.internal
2021-07-16T13:33:46Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-XX-XX-X-XXX.eu-west-1.compute.internal", Flush Interval:1s
2021-07-16T13:33:46Z I! [logagent] starting
2021-07-16T13:33:46Z I! [logagent] found plugin cloudwatchlogs is a log backend
2021-07-16T13:33:46Z I! [logagent] found plugin logfile is a log collection
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
=======> 2021-07-16T13:33:46Z I! cloudwatch: get unique roll up list [[AutoScalingGroupName] [InstanceId InstanceType] []]
2021-07-16T13:33:46Z I! cloudwatch: publish with ForceFlushInterval: 30s, Publish Jitter: 11s
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
=======> 2021-07-16T13:33:47Z I! [logagent] piping log from APP-DEV-php-errors-logs/XX.XX.X.XXX(/var/log/php-fpm/error.log) to cloudwatchlogs
2021-07-16T13:33:54Z I! Profiler is stopped during shutdown
2021-07-16T13:33:54Z I! [agent] Hang on, flushing any cached metrics before shutdown
2021/07/16 13:33:55 I! I! Detected the instance is EC2
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default ...
Valid Json input schema.
I! Detecting run_as_user...
No csm configuration found.
No log configuration found.
Configuration validation first phase succeeded
2021/07/16 13:33:55 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default ...
2021/07/16 13:33:55 I! Detected runAsUser: cwagent
2021/07/16 13:33:55 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to 994:992
2021/07/16 13:33:55 I! Set HOME: /home/cwagent
2021-07-16T13:33:55Z I! Starting AmazonCloudWatchAgent 1.247348.0
2021-07-16T13:33:55Z I! Loaded inputs: disk mem
2021-07-16T13:33:55Z I! Loaded aggregators:
2021-07-16T13:33:55Z I! Loaded processors: ec2tagger
2021-07-16T13:33:55Z I! Loaded outputs: cloudwatch
2021-07-16T13:33:55Z I! Tags enabled: host=ip-XX-XX-X-XXX.eu-west-1.compute.internal
2021-07-16T13:33:55Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-XX-XX-X-XXX.eu-west-1.compute.internal", Flush Interval:1s
2021-07-16T13:33:55Z I! [logagent] starting
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
=======> 2021-07-16T13:33:55Z I! cloudwatch: get unique roll up list []
2021-07-16T13:33:55Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 26s
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
2021-07-16T13:39:07Z I! [processors.ec2tagger] ec2tagger: Refresh is no longer needed, stop refreshTicker.
如您所见,收集了用户数据 运行 的初始命令以及自定义指标和日志(请参阅相关行前的 ====> 标记)。
然而几秒钟后,在 Cloud Init 结束后,cloudwatch 代理被 systemd 以某种方式重新启动,文件系统中不存在文件 amazon-cloudwatch-agent.json
,因此代理 运行s 具有默认参数.
但是,如果我在启动后手动重新运行命令,一切正常,但当然我需要它在自动缩放启动时自动执行。
我试过的
直接用systemd启动amazon cloudwatch代理,尝试chown配置文件为只读,只获取配置,让系统自己启动代理,但问题依然存在。
感谢您的帮助
解决方法
预装的ssm-agent与Cloudwtach Agent冲突。在 Packer 构建期间卸载 ssm-agent:
sudo yum erase amazon-ssm-agent --assumeyes
说明
终于发现新安装的cloudwatch agent和亚马逊Linux2镜像默认安装的SSM agent冲突了。
事实上,我首先尝试了一个丑陋的解决方法,即在用户数据中使用 sed 替换 amazon-cloudwatch-agent 服务的 StartExec 行:
sed -i '/ExecStart/c\ExecStart=/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json' /etc/systemd/system/amazon-cloudwatch-agent.service
这样,当服务在实例启动后重新启动时,它将使用我的自定义配置。
但是我后来发现服务文件在 Cloud Init 结束后也被替换了。
查看系统消息,我注意到 ssm-agent 在 Cloud Init 结束后正在执行一些配置重新加载,因此我认为它可能是罪魁祸首。
我最终在构建我的 AMI 的打包程序构建中卸载了它,因此它不会在实例启动时出现,最后我的配置不再被覆盖。
请注意,我对 ssm-agent 的工作原理没有深入的了解,可能有一种正确的方法可以使用一些 SSM 配置来实例化 Cloudwatch Agent。
由于我们目前没有使用SSM,我也没有足够的时间研究这个选项,所以我选择了这个折衷方案。
如果有人能想出一个更简洁的解决方案,通过自动化方法使用 ssm-agent,我们将不胜感激。
问题
我只是想在启动时使用 AWS 用户数据在 Amazon Linux 2 个实例上安装 Cloudwatch 代理。出于某种原因,在 Cloud Init 完成 运行ning 后,所有服务都重新启动,我放在 cloudwatch 文件夹中的配置文件不再存在。
我使用的是使用 Packer 预构建的自定义 AMI,我的配置文件从 Ansible 模板放入 /opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json
。这是我要使用的配置文件,其中包含我要发送的所有指标和日志。然后我在安装代理后在启动时将其复制到 /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
。
这是我的用户数据脚本:
#!/bin/bash
yum install amazon-cloudwatch-agent -y
cp /opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
发生了什么
启动完成后,我可以正确看到脚本运行。如果我 运行 cat /opt/aws/amazon-cloudwatch-agent/log/amazon-cloudwatch-agent.log
我可以看到以下内容:
2021/07/16 13:33:46 I! I! Detected the instance is EC2
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
Valid Json input schema.
I! Detecting run_as_user...
No csm configuration found.
Configuration validation first phase succeeded
2021/07/16 13:33:46 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
2021/07/16 13:33:46 I! Detected runAsUser: root
2021/07/16 13:33:46 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to root:root
2021-07-16T13:33:46Z I! Starting AmazonCloudWatchAgent 1.247347.4
2021-07-16T13:33:46Z I! Loaded inputs: netstat diskio logfile mem net processes swap cpu disk
2021-07-16T13:33:46Z I! Loaded aggregators:
2021-07-16T13:33:46Z I! Loaded processors: delta ec2tagger
2021-07-16T13:33:46Z I! Loaded outputs: cloudwatch cloudwatchlogs
2021-07-16T13:33:46Z I! Tags enabled: host=ip-XX-XX-X-XXX.eu-west-1.compute.internal
2021-07-16T13:33:46Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-XX-XX-X-XXX.eu-west-1.compute.internal", Flush Interval:1s
2021-07-16T13:33:46Z I! [logagent] starting
2021-07-16T13:33:46Z I! [logagent] found plugin cloudwatchlogs is a log backend
2021-07-16T13:33:46Z I! [logagent] found plugin logfile is a log collection
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
=======> 2021-07-16T13:33:46Z I! cloudwatch: get unique roll up list [[AutoScalingGroupName] [InstanceId InstanceType] []]
2021-07-16T13:33:46Z I! cloudwatch: publish with ForceFlushInterval: 30s, Publish Jitter: 11s
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
=======> 2021-07-16T13:33:47Z I! [logagent] piping log from APP-DEV-php-errors-logs/XX.XX.X.XXX(/var/log/php-fpm/error.log) to cloudwatchlogs
2021-07-16T13:33:54Z I! Profiler is stopped during shutdown
2021-07-16T13:33:54Z I! [agent] Hang on, flushing any cached metrics before shutdown
2021/07/16 13:33:55 I! I! Detected the instance is EC2
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default ...
Valid Json input schema.
I! Detecting run_as_user...
No csm configuration found.
No log configuration found.
Configuration validation first phase succeeded
2021/07/16 13:33:55 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default ...
2021/07/16 13:33:55 I! Detected runAsUser: cwagent
2021/07/16 13:33:55 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to 994:992
2021/07/16 13:33:55 I! Set HOME: /home/cwagent
2021-07-16T13:33:55Z I! Starting AmazonCloudWatchAgent 1.247348.0
2021-07-16T13:33:55Z I! Loaded inputs: disk mem
2021-07-16T13:33:55Z I! Loaded aggregators:
2021-07-16T13:33:55Z I! Loaded processors: ec2tagger
2021-07-16T13:33:55Z I! Loaded outputs: cloudwatch
2021-07-16T13:33:55Z I! Tags enabled: host=ip-XX-XX-X-XXX.eu-west-1.compute.internal
2021-07-16T13:33:55Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-XX-XX-X-XXX.eu-west-1.compute.internal", Flush Interval:1s
2021-07-16T13:33:55Z I! [logagent] starting
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
=======> 2021-07-16T13:33:55Z I! cloudwatch: get unique roll up list []
2021-07-16T13:33:55Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 26s
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
2021-07-16T13:39:07Z I! [processors.ec2tagger] ec2tagger: Refresh is no longer needed, stop refreshTicker.
如您所见,收集了用户数据 运行 的初始命令以及自定义指标和日志(请参阅相关行前的 ====> 标记)。
然而几秒钟后,在 Cloud Init 结束后,cloudwatch 代理被 systemd 以某种方式重新启动,文件系统中不存在文件 amazon-cloudwatch-agent.json
,因此代理 运行s 具有默认参数.
但是,如果我在启动后手动重新运行命令,一切正常,但当然我需要它在自动缩放启动时自动执行。
我试过的
直接用systemd启动amazon cloudwatch代理,尝试chown配置文件为只读,只获取配置,让系统自己启动代理,但问题依然存在。
感谢您的帮助
解决方法
预装的ssm-agent与Cloudwtach Agent冲突。在 Packer 构建期间卸载 ssm-agent:
sudo yum erase amazon-ssm-agent --assumeyes
说明
终于发现新安装的cloudwatch agent和亚马逊Linux2镜像默认安装的SSM agent冲突了。 事实上,我首先尝试了一个丑陋的解决方法,即在用户数据中使用 sed 替换 amazon-cloudwatch-agent 服务的 StartExec 行:
sed -i '/ExecStart/c\ExecStart=/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json' /etc/systemd/system/amazon-cloudwatch-agent.service
这样,当服务在实例启动后重新启动时,它将使用我的自定义配置。 但是我后来发现服务文件在 Cloud Init 结束后也被替换了。
查看系统消息,我注意到 ssm-agent 在 Cloud Init 结束后正在执行一些配置重新加载,因此我认为它可能是罪魁祸首。 我最终在构建我的 AMI 的打包程序构建中卸载了它,因此它不会在实例启动时出现,最后我的配置不再被覆盖。
请注意,我对 ssm-agent 的工作原理没有深入的了解,可能有一种正确的方法可以使用一些 SSM 配置来实例化 Cloudwatch Agent。 由于我们目前没有使用SSM,我也没有足够的时间研究这个选项,所以我选择了这个折衷方案。
如果有人能想出一个更简洁的解决方案,通过自动化方法使用 ssm-agent,我们将不胜感激。