This article explains how to create an alert that will trigger if the host where the Scalyr Agent is running is shut down (or crashes).
There are several assumptions that we took into consideration:
- The Scalyr Agent is started (and shut down) with the host by default. In other words, the Scalyr Agent isn't started / stopped by a scheduled process (this isn't recommended, either)
- If the host is shut down or crashes, the Scalyr Agent will not continue to broadcast metrics. Since our primary concern is identifying when the host becomes unavailable, we'll trigger the alert if a regularly updated metric (like
proc.loadavg.1min) becomes unavailable.
- Metrics are broadcast by the Scalyr Agent once every 30 seconds (this is the default interval)
For this example, we chose to use the following trigger:
count:3m($source='tsdb' $serverHost == 'HOST' metric='proc.loadavg.1min') < 1
Video Explanation: https://youtu.be/dUIwVyrF33I?t=102
This triggers the alert if no
proc.loadavg.1min metrics are received within a 3 minute interval. Since 6 updates are expected within 3 minutes, this has some built in leniency against false alarms