文章目录
  1. 1. Puppet 常见故障排错
    1. 1.1. 请求篇
      1. 1.1.1. 为什么使用“puppet cert —list”命令查看不到证书?
      2. 1.1.2. 收到“certificates were not trusted”信息
      3. 1.1.3. 客户端运行test时遇到如下错误
      4. 1.1.4. 遇到错误 “err: Could not request certificate: Connection refused - connect”
      5. 1.1.5. 遇到“warning: peer certificate won’t be verified in this SSL session”
      6. 1.1.6. 遇到“failed to retrieve certificate and waitforcert is disabled”
      7. 1.1.7. 遇到“Failed to retrieve current state of resource: Could not retrieve information from source(s)”
      8. 1.1.8. 错误“Could not retrieve information from environment production source(s) puppet://”
      9. 1.1.9. 错误“Could not request certificate: undefined method ‘closed?’”
      10. 1.1.10. 错误“Change from absent to file failed”
      11. 1.1.11. 错误“Change failed … Could not find server”
      12. 1.1.12. 错误“Could not retrieve catalog from remote server”
      13. 1.1.13. 错误“Run of Puppet configuration client already in progress”
      14. 1.1.14. 错误“Cannot override local resource on node”
      15. 1.1.15. 错误“Error 400 on SERVER: No support for http method POST”
    2. 1.2. 语法篇
    3. 1.3. 排错思路篇

Puppet 常见故障排错

请求篇

为什么使用“puppet cert —list”命令查看不到证书?

:puppet cert 必须要有root权限,需要使用 sudo puppet cert --list

收到“certificates were not trusted”信息

:通常这种错误出现在重新安装了服务,但没有清除之前的证书。这个时候需要在Master和本地清除相关证书。

  1. 在Puppet Master上清除该主机证书:sudo puppet cert --clean {node certname}
  2. 在本地删除证书文件目录:sudo rm -r /etc/puppet/ssl; rm -r /var/lib/puppet/ssl

客户端运行test时遇到如下错误

1
2
3
4
err: Could not retrieve catalog from remote server: hostname not match with the server certificate
warning: Not using cache on failed catalog
err: Could not retrieve catalog; skipping run
err: Could not send report: hostname not match with the server certificate

:解决方案是 修改服务器的主机名本机的hosts解析
官方提示在遇到hostname not match with the server certificate错误信息提示时,需要检查Master的主机名与agent解析时是否一致。否则会导致无法进行下一步,解决步骤如下:

  1. 查看Master认证的主机名: $ sudo puppet master --configprint certname
  2. 停止master服务:sudo /etc/init.d/puppetmaster stop
  3. 删除master的私钥和公钥:$ sudo find $(puppet master --configprint ssldir) -name "$(puppet master --configprint certname).pem" -delete
  4. 在主配置文件/etc/puppet/puppet.conf中配置certname选项指定master认证主机名,建议与hostname保持一致。
  5. 启动master进行测试,$ sudo puppet master --no-daemonize --verbose
  6. 确定OK后可以启动Master:sudo /etc/init.d/puppetmaster start

遇到错误 “err: Could not request certificate: Connection refused - connect”

:无法连接至master。有如下几种情况:

  1. Master没有启动
  2. Master防火墙没有开8140 port or close iptables firewall
  3. agent上没有指定master hosts’IP或者是无法解析master’IP

与此同类的错误还有”err: Could not request certificate: getaddrinfo: Name or service not known”但解决方案同上。

遇到“warning: peer certificate won’t be verified in this SSL session”

:master没有颁发证书,在master查看相关客户端证书请求后进行颁发。

  1. 查看未颁发证书的客户端:sudo puppet cert list
  2. 颁发证书:sudo puppet cert sign {node certname}

遇到“failed to retrieve certificate and waitforcert is disabled”

:证书失效或颁发不正当,清除后重试。

  1. Master清除证书命令:sudo puppet cert clean {node certname}
  2. 客户端删除证书目录:sudo rm -r /etc/puppet/ssl; rm -r /var/lib/puppet/ssl

遇到“Failed to retrieve current state of resource: Could not retrieve information from source(s)”

错误提示类似于:
err: //test/File[/tmp/foo]: Failed to retrieve current state of resource: Could not retrieve information from source(s) puppet:///test/foo at /etc/puppet/modules test/manifests/init.pp:5

:这种错误大家比较常见,一般是配置文件写得有问题导致,在file资源的写法中,如果采用模块当前目录下的file。配置文件应该写为source => "puppet:///test/foo"
代码已有提示 test/manifests/init.pp:5第5行写法有误。

错误“Could not retrieve information from environment production source(s) puppet://”

这也是比较常见的错误,提示如下:
err: /File[/var/lib/puppet/lib]: Could not evaluate: Could not retrieve information from environment production source(s) puppet://foo/plugins

:这是Master配置文件不当,需要在puppet.conf中配置 pluginsync=false 或者在任意模块目录创建lib自定义一个fact,不过这带来的坏处是所有的agent都会同步此lib。

错误“Could not request certificate: undefined method ‘closed?’”

错误提示如下:
err: Could not request certificate: undefined method 'closed?' for nil:NilClass Exiting; failed to retrieve certificate and watiforcert is disabled

: 通常是权限或防火墙导致,需要确保agent有root权限能够读取certificates文件,也就是/var/lib/puppet/ssl目录权限。同时检查8140端口是否被墙。

错误“Change from absent to file failed”

错误提示:err: //test/File[/tmp/missing/foo]/ensure: change from absent to file failed: Could not set file on ensure: No such file or directory - /tmp/missing/foo at /etc/puppet/modules/test/manifests/init.pp:5

:这个错误提示很明显,目录不存在,解决方案就是创建/tmp/missing目录,可以采用puppet file directory来创建,或者手工创建。

错误“Change failed … Could not find server”

错误提示:err: //test/File[/tmp/foo]/content: change from {md5}068008008418dff20750a94336318974 to {md5}8db2d67767577c70b1251fd80ad32943 failed: Could not find server puppet

: 这个错误是配置了filebucket,但在配置server name时采用的是puppet,导致agent无法找到puppet master,需要使用完整的域名。配置如下:

1
2
3
filebucket {
    puppetmaster: server => "puppet1.example.com"
}

### 错误“undefined method `closed?’ for nil:NilClass”
错误提示: err: Could not retrieve catalog from remote server: undefined method closed? for nil:NilClass

: 语法配置错误,请检查语法,建议安装puppet-lint. https://github.com/rodjek/puppet-lint

### 错误“certificate verify failed”
错误提示:err: /File[/var/lib/puppet/lib]: Failed to generate additional resources using 'eval_generate': SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed err: /File[/var/lib/puppet/lib]: Failed to retrieve current state of resource: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed Could not retrieve file metadata for puppet://puppet.example.com/plugins: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed err: Could not retrieve catalog from remote server: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed

: 证书验证失败,可能是由于服务重装,多个master验证,等原因导致,删除本地证书目录,重新进行授权即可。agent删除证书目录命令:find /var/lib/puppet -type f -print0 |xargs -0r rm

### 错误“no certificate found and waitforcert is disabled”
错误提示:warning: peer certificate won't be verified in this SSL session Exiting; no certificate found and waitforcert is disabled

: Puppet Master没有给该机器授权,颁发签名证书。

1
2
puppet cert list (查看未颁发签名证书的机器列表)
puppet cert sign node1.example.com (颁发签名证书)

错误“Could not retrieve catalog from remote server”

错误提示:err: Could not retrieve catalog from remote server: No such file or directory - /var/lib/puppet/client_yaml/catalog
: 没有创建yaml相关的文件,所以导致agent无法写。在使用yaml时比较常见。

错误“Run of Puppet configuration client already in progress”

错误提示:notice: Run of Puppet configuration client already in progress; skipping
: puppet agent进程正在后台运行,如果你使用puppet agent —test时会报如上错误,可以采用ps axf命令查看puppet进程。然后查看puppet 进程锁文件 /var/lib/puppet/state/puppetdlock 是否已经存在。puppet 3.x 有效地改进了这一错误,取消掉了puppet kick功能。理由就是在kick时可能agent正在运行,会导致大量的失败。

错误“Cannot override local resource on node”

错误提示:err: Could not retrieve catalog from remote server: Error 400 on SERVER: Exported resource Opsviewmonitored[foo] cannot override local resource on node bar.example.com
: export 虚拟资源时会出现此错误,可能是因为重复定义,或其它原因,给出的解决方案是运行puppet clean node。然后查询数据库相关的信息select hosts.name from hosts,resources where restype='Opsviewmonitored' and title='foo' and hosts.id = resources.host_id;

错误“Error 400 on SERVER: No support for http method POST”

错误提示:err: Could not retrieve catalog from remote server: Error 400 on SERVER: No support for http method POST
: 版本问题。Master的版本一定要高于Client。


语法篇

略,采用puppet-lint进行排查吧。

排错思路篇

目前大部分错误都集中在授权,以前Hosts或DNS解析上。这需要去简单理解puppet的工作原理,然后再根据错误提示进行修改。很多人在收到错误提示后,都不思考或查看,这需要做出一些改变。

回头有空再整理些架构上的case吧。

参考 官方troubleshooting

参考 Puppet errors explained

文章目录
  1. 1. Puppet 常见故障排错
    1. 1.1. 请求篇
      1. 1.1.1. 为什么使用“puppet cert —list”命令查看不到证书?
      2. 1.1.2. 收到“certificates were not trusted”信息
      3. 1.1.3. 客户端运行test时遇到如下错误
      4. 1.1.4. 遇到错误 “err: Could not request certificate: Connection refused - connect”
      5. 1.1.5. 遇到“warning: peer certificate won’t be verified in this SSL session”
      6. 1.1.6. 遇到“failed to retrieve certificate and waitforcert is disabled”
      7. 1.1.7. 遇到“Failed to retrieve current state of resource: Could not retrieve information from source(s)”
      8. 1.1.8. 错误“Could not retrieve information from environment production source(s) puppet://”
      9. 1.1.9. 错误“Could not request certificate: undefined method ‘closed?’”
      10. 1.1.10. 错误“Change from absent to file failed”
      11. 1.1.11. 错误“Change failed … Could not find server”
      12. 1.1.12. 错误“Could not retrieve catalog from remote server”
      13. 1.1.13. 错误“Run of Puppet configuration client already in progress”
      14. 1.1.14. 错误“Cannot override local resource on node”
      15. 1.1.15. 错误“Error 400 on SERVER: No support for http method POST”
    2. 1.2. 语法篇
    3. 1.3. 排错思路篇