Monitoring Resources
In-depth guides on the concepts behind reliable scheduled jobs — monitoring models, on-call practices, incident response, and ready-to-use templates.
Concepts & Comparisons
Mental models for monitoring — what to watch, and why it matters.
Heartbeat Monitoring vs Uptime Monitoring
Uptime monitoring probes your URLs from the outside. Heartbeat monitoring listens for a ping from your job. Learn when each model fits and why they are complementary.
Read guide →Monitor AI Agents and LLM Pipelines
AI agent jobs fail silently. Learn how heartbeat monitoring catches missed LLM runs, stuck embeddings, and n8n workflows that quietly stop producing results.
Read guide →On-Call & Incidents
Run a humane on-call rotation and respond to failures with a plan.
On-Call Rotation Best Practices for Dev Teams
A practical guide to on-call rotation models, fair scheduling, handoffs, runbooks, and burnout prevention for small and medium engineering teams.
Read guide →Escalation Policies for Cron Job Alerting
Learn how escalation policies work, why you need alert tiers, and how to design one for cron job failures with PagerDuty and Opsgenie.
Read guide →Templates
Free, copy-ready templates you can adapt to your team today.