From GarrettHoneycutt

Jump to: navigation, search



How to monitor Puppet and related services. This is meant for monitoring which means we want to alert someone based on some metric. We will also want information for trending purposes so that we can see data over time without wanting to alert anyone. While trending is mentioned in this document, this is not a list for that.

Puppet Master

Remote Checks

Puppet Master Service

   curl -k -H 'Accept: pson' https://puppet1.domain.tld:8140/production/status/no_key

Must return



This requires the following in /etc/puppet/auth.conf just above the last section.

# allow anyone to see if a puppet master is alive.
# used for monitoring
path /status/no_key
method find
auth any
allow *

Local Checks


This is a CPU constrained service, so gathering the data for trending but do not send alerts.


Lame. Stop doing this.


Check that ssh is available.

Disk Usage

We will want to trend this data for sure. Warning if any mount hits 75% and Critical at 90%


This will need to be tuned per host. Normal load is # of processors + 1. For monitoring, we should look at the 15 minute load average.

Memory Usage

Warn at 90% used actual memory (not including buffers and cache) and Critical if 10% of swap is being used.


Ensure there are no zombie processes.


Remote Checks

Test that the service is working by querying the list of nodes. The PuppetDB node itself should be present.

curl -H 'Accept: application/json'

check output for fqdn of PuppetDB server, such as

"name" : ""

JVM Heap bytes(HeapMemoryUsage) should not go past 85% of allocated heap

curl -H "Accept: application/json" http://puppetdb.domain.tld:8080/v2/metrics/mbean/java.lang:type=Memory

Command Queue Depth(QueueSize) should be less than 10.

curl -H "Accept: application/json" http://puppetdb.domain.tld:8080/v2/metrics/mbean/org.apache.activemq:BrokerName=localhost,Type=ueue,Destination=com.puppetlabs.puppetdb.commands

Local Checks

None needed.

Puppet Dashboard

Remote Checks

Login to service

Test that the service is responding by connect via HTTP with an valid username/password and see that return code 200 is received.

curl --user monitoring:hashedpassword http://puppetdashboard.domain.tld:3000

Check that classes are being returned

The Dashboard runs on puppet1, so we should be able query for puppet1 for a parameter that is added to that node.

curl -H 'Accept: text/yaml' --user monitoring:hashedpassword http://puppetdashboard.domain.tld:3000/nodes/puppet1.domain.tld

Should receive a 200 response with valid yaml with parameter monitoring set to working. Note: should write a ruby script for this.

  monitoring: working

Local Checks


We can check this with

cd /usr/share/puppet-dashboard && rake RAILS_ENV=production db:version

which should match the following regular expression

/^Current version: (\d){14}$/

Delayed workers

sudo -u puppet-dashboard env RAILS_ENV=production /usr/share/puppet-dashboard/script/delayed_job status 2>/dev/null

should return two lines with pids such as

delayed_job: running [pid 30391]
delayed_job: running [pid 30385]

Should match at least twice to the following regex

/delayed_job: running \[pid (\d)+\]/