PuppetMonitoring
From GarrettHoneycutt
Gh (Talk | contribs)
(Created page with '= About = How to monitor Puppet and related services. This is meant for monitoring which means we want to alert someone based on some metric. We will also want information for tr...')
Newer edit →
Current revision as of 18:08, 28 February 2014
Contents |
About
How to monitor Puppet and related services. This is meant for monitoring which means we want to alert someone based on some metric. We will also want information for trending purposes so that we can see data over time without wanting to alert anyone. While trending is mentioned in this document, this is not a list for that.
Puppet Master
Remote Checks
Puppet Master Service
curl -k -H 'Accept: pson' https://puppet1.domain.tld:8140/production/status/no_key
Must return
{"is_alive":true}
Setup
This requires the following in /etc/puppet/auth.conf just above the last section.
# allow anyone to see if a puppet master is alive. # used for monitoring path /status/no_key method find auth any allow *
Local Checks
CPU
This is a CPU constrained service, so gathering the data for trending but do not send alerts.
PING
Lame. Stop doing this.
SSH
Check that ssh is available.
Disk Usage
We will want to trend this data for sure. Warning if any mount hits 75% and Critical at 90%
Load
This will need to be tuned per host. Normal load is # of processors + 1. For monitoring, we should look at the 15 minute load average.
Memory Usage
Warn at 90% used actual memory (not including buffers and cache) and Critical if 10% of swap is being used.
Processes
Ensure there are no zombie processes.
PuppetDB
Remote Checks
Test that the service is working by querying the list of nodes. The PuppetDB node itself should be present.
curl -H 'Accept: application/json' http://puppetdbexample.com:8080/v2/nodes
check output for fqdn of PuppetDB server, such as
"name" : "puppet1.example.com"
JVM Heap bytes(HeapMemoryUsage) should not go past 85% of allocated heap
curl -H "Accept: application/json" http://puppetdb.domain.tld:8080/v2/metrics/mbean/java.lang:type=Memory
Command Queue Depth(QueueSize) should be less than 10.
curl -H "Accept: application/json" http://puppetdb.domain.tld:8080/v2/metrics/mbean/org.apache.activemq:BrokerName=localhost,Type=ueue,Destination=com.puppetlabs.puppetdb.commands
Local Checks
None needed.
Puppet Dashboard
Remote Checks
Login to service
Test that the service is responding by connect via HTTP with an valid username/password and see that return code 200 is received.
curl --user monitoring:hashedpassword http://puppetdashboard.domain.tld:3000
Check that classes are being returned
The Dashboard runs on puppet1, so we should be able query for puppet1 for a parameter that is added to that node.
curl -H 'Accept: text/yaml' --user monitoring:hashedpassword http://puppetdashboard.domain.tld:3000/nodes/puppet1.domain.tld
Should receive a 200 response with valid yaml with parameter monitoring set to working. Note: should write a ruby script for this.
--- parameters: monitoring: working
Local Checks
Database
We can check this with
cd /usr/share/puppet-dashboard && rake RAILS_ENV=production db:version
which should match the following regular expression
/^Current version: (\d){14}$/
Delayed workers
sudo -u puppet-dashboard env RAILS_ENV=production /usr/share/puppet-dashboard/script/delayed_job status 2>/dev/null
should return two lines with pids such as
delayed_job: running [pid 30391] delayed_job: running [pid 30385]
Should match at least twice to the following regex
/delayed_job: running \[pid (\d)+\]/