日志文件显示间歇性的成功和失败

log file shows intermittent success and failure

我会先尝试一个简短的版本,然后我可以根据要求添加更多信息。

我有一台具有以下配置的客户端机器:

------------------------------------------------------------
Connected to puppet-client-10 as root
Debian 7.8 wheezy (amd64)
------------------------------------------------------------
FQDN        : puppet-client-10.mydomain
IP          : 161.148.1.10

PuppetMaster: puppet-master.mydomain
Puppet      : 3.7.5
Facter      : 2.2.0
------------------------------------------------------------

正在连接到以下 puppetmaster:

------------------------------------------------------------
Connected to puppet-master as root
Debian 7.8 wheezy (amd64)
------------------------------------------------------------
FQDN        : puppet-master.mydomain
IP          : 161.148.1.1

Puppet      : 3.7.5
Facter      : 2.4.3
------------------------------------------------------------

现在,回到客户端。 我曾经禁用代理,每天通过 cron 检查更新一次。

6 22 * * * root /usr/bin/puppet agent --test --logdest syslog

完美运行。

2 天前我评论了 cron 作业并启用代理每小时检查一次更新。

然后,日志开始每 2 分钟显示一次此行

<27>1 2015-05-20T08:20:30.651767-03:00 puppet-client-10 puppet-agent 8072 - -  Could not request certificate: getaddrinfo: Name or service not known
<27>1 2015-05-20T08:22:30.668988-03:00 puppet-client-10 puppet-agent 8072 - -  Could not request certificate: getaddrinfo: Name or service not known

此外,显示客户端正在正确检查主服务器的更新

<28>1 2015-05-20T08:23:44.927447-03:00 puppet-client-10 puppet-agent 31500 - -  Loading class elasticsearch
<28>1 2015-05-20T08:23:45.406158-03:00 puppet-client-10 puppet-agent 31500 - -  Loading class logstash
<28>1 2015-05-20T08:23:45.776948-03:00 puppet-client-10 puppet-agent 31500 - -  Loading class logrotate
<28>1 2015-05-20T08:23:46.204161-03:00 puppet-client-10 puppet-agent 31500 - -  Loading class puppet

然后,每2分钟返回getaddrinfo错误

<27>1 2015-05-20T08:24:30.676307-03:00 puppet-client-10 puppet-agent 8072 - -  Could not request certificate: getaddrinfo: Name or service not known
<27>1 2015-05-20T08:26:30.683570-03:00 puppet-client-10 puppet-agent 8072 - -  Could not request certificate: getaddrinfo: Name or service not known

它在错误(每 2 分钟)和成功(每小时)消息之间不断交替。

执行命令 puppet agent --test 正常工作。
问题似乎出在代理上。

有什么提示吗?


i would guess it is because your puppet master isn't named "puppet". Also I'd check what user the puppet agent you now have running is running as, probably not root I'd guess – Vorsprung

它被命名为 puppet-master,也被命名为 puppet-master.mydomain,并具有以下替代名称

# puppet cert list puppet-master.mydomain  

+ "puppet-master.mydomain" (SHA256) F2:54:03:9C 
  (alt names: "DNS:puppet", "DNS:puppet.mydomain", "DNS:puppet-master.mydomain")  

root

一样运行
# ps aux | grep puppet

root      1763  0.0  0.2 133776 45236 ?        Ssl  Mai19   0:07 /usr/bin/ruby /usr/bin/puppet agent
root      8072  0.0  0.2 194580 40144 ?        Ssl  Mai19   0:02 /usr/bin/ruby /usr/bin/puppet agent

现在,8072 上面的进程正在向错误行发送垃圾邮件。

我真的应该有 2 个进程吗运行?

该错误表明将主机名解析为 IP 时出现问题,但考虑到它每小时都会成功并且手动也会成功,我认为您的名称解析没有任何配置问题。

你应该只有一个 puppet-agent 进程 运行,我会停止 puppet-agent 服务,确保所有进程都被杀死,重新启动 puppet-agent 服务并确保只有一个过程是 运行.

我敢打赌其中一个进程会做一些愚蠢的事情。