Monitor systemd Timers and Catch Silent Failures

systemd timers are the modern Linux replacement for cron, but they share cron's most dangerous trait: when a scheduled job stops running, nothing tells you. A timer can be disabled during a system update, a service unit can exit non-zero and land in a failed state, or an OnCalendar expression can be silently ignored because of a parse error, and in every case your scheduled work simply stops happening with no alert. Adding an external heartbeat monitor is the only reliable way to detect these silent failures before they cause real damage.

Why systemd Timers Fail Without Warning

systemd separates the scheduler (the .timer unit) from the worker (the .service unit), and both can fail independently. A timer can be in an active, waiting state while the service it activates is stuck in a failed state from a previous run. By default systemd does not retry a failed oneshot service, so every future trigger from the timer is simply skipped. Worse, OnCalendar expressions that contain a syntax error are not fatal: systemd logs a single low-level warning and silently drops the malformed directive, leaving the timer armed but scheduled for the wrong time or never. The journal captures all of this, but only if you actively look. There is no built-in mechanism to send an alert when a timer-triggered service fails, when a timer is disabled, or when a run is missed because the machine was off and Persistent=true was not set.

  • The .service unit enters a failed state after a non-zero exit and stays there. The paired .timer continues waiting and fires again, but systemd skips activation of a unit already in failed state until it is manually reset with systemctl reset-failed.
  • A typo in OnCalendar (for example, writing Mon-Fri instead of Mon..Fri) causes systemd to log Failed to parse calendar specification, ignoring and silently drop the trigger. The timer shows as active and loaded but NEXT is blank or set far in the future.
  • Running systemctl disable mytimer.timer during a package upgrade or Ansible run deactivates the schedule entirely. The LAST column in systemctl list-timers stops updating, but no notification is sent.
  • When the machine is off or suspended at the scheduled time and Persistent=true is absent, the run is simply missed. There is no catch-up and no alert.
  • The service ExecStart command can fail due to a missing binary, a broken PATH (systemd services run with a minimal environment, not the interactive shell PATH), or a changed working directory, producing a failed state that looks identical to a deliberate stop.
  • Transient log rotation, journal size limits, or a journald restart can cause the failure record to disappear before anyone checks, making post-hoc debugging impossible.

Use an External Heartbeat Monitor to Catch What journalctl Misses

The fundamental problem with relying on journalctl, systemctl list-timers, or OnFailure email hooks is that they all depend on the host being healthy and someone actively polling them. If the timer is disabled, the machine is rebooted into a bad state, or the service silently produces no output and exits zero, none of those internal signals fire. A heartbeat monitor (also called a dead-man's switch) inverts the logic. You configure an expected ping interval, and your job calls a unique URL when it completes successfully. If that ping does not arrive within the period plus a configurable grace window, the monitoring service sends an alert. This approach catches every failure mode described above, including the ones where the job never starts at all, because silence itself becomes the alarm. CronJobPro provides this pattern with a unique ping URL in the form https://cronjobpro.com/ping/<token>. When your service unit calls this URL on success, CronJobPro resets the countdown. If the countdown expires without a ping, CronJobPro alerts you via email, Slack, Discord, Microsoft Teams, PagerDuty, Opsgenie, or a webhook of your choice. There are also companion endpoints for explicit failure reporting: https://cronjobpro.com/ping/<token>/fail to signal that the job ran but encountered an error, and https://cronjobpro.com/ping/<token>/exitcode/<n> to forward the numeric exit code directly. Using ExecStartPost in the .service unit is the idiomatic place to call these endpoints in systemd, because ExecStartPost runs after ExecStart finishes and receives the service's exit status through the $EXIT_STATUS environment variable (available via a wrapper script).

Add a heartbeat to systemd timers

  1. 1

    Create a heartbeat monitor in CronJobPro

    Log into CronJobPro, create a new Heartbeat monitor, and set the expected period to match your timer's schedule (for example, 24 hours for a daily job) plus a grace window that accounts for legitimate runtime variance. Copy the unique ping URL you are given, which takes the form https://cronjobpro.com/ping/<token>.

  2. 2

    Add ExecStartPost to your .service unit

    Open your service unit file (typically in /etc/systemd/system/). Add an ExecStartPost line that calls curl with the ping URL on success. Prefix the line with =- so that a curl failure does not cause the service itself to be marked as failed. Use curl flags -fsS --max-time 10 --retry 3 to suppress output on success, enforce a timeout, and retry transient network errors.

  3. 3

    Handle non-zero exits with a wrapper script

    For jobs where you also want to report failures, replace ExecStart with a small wrapper script. The script runs your actual command, captures the exit code, and then calls either the /ping/<token> URL on exit code 0 or the /ping/<token>/fail URL on any other exit code. Using ExecStartPost alone will not fire when ExecStart returns non-zero, because systemd skips ExecStartPost on failure by default (unless you prefix it with + or use a separate ExecStopPost).

  4. 4

    Reload systemd and verify the timer is armed

    Run systemctl daemon-reload to pick up the changed unit file, then systemctl restart mytimer.timer. Confirm the schedule with systemctl list-timers --all | grep mytimer and check that the NEXT column shows the expected upcoming time and ACTIVATES names your service unit. Run systemd-analyze verify /etc/systemd/system/mytimer.timer to catch any OnCalendar syntax errors before they are silently dropped.

  5. 5

    Validate end-to-end by triggering the job manually

    Run systemctl start myservice.service to execute the job immediately without waiting for the timer. Open the CronJobPro dashboard and confirm the ping was received. Then temporarily set a very short OnCalendar interval (every 2 minutes), wait one cycle, and verify the heartbeat resets on schedule. Restore the correct interval and daemon-reload before leaving the system.

ini

# /etc/systemd/system/mybackup.service
[Unit]
Description=Daily database backup
After=network.target

[Service]
Type=oneshot
User=backup
Environment="PATH=/usr/local/bin:/usr/bin:/bin"

# Run the actual job
ExecStart=/usr/local/bin/run-backup.sh

# On success, ping CronJobPro heartbeat URL
# The leading - tells systemd to ignore curl failure so it does not
# affect the service's own exit status.
ExecStartPost=-/usr/bin/curl -fsS --max-time 10 --retry 3 \
    https://cronjobpro.com/ping/YOUR_TOKEN_HERE

# On failure, call the /fail endpoint via ExecStopPost.
# $EXIT_CODE is set by systemd when the service exits.
# We use a shell condition so the /fail ping only fires on non-zero exit.
ExecStopPost=/bin/sh -c \
    'if [ "$EXIT_CODE" != "exited" ] || [ "$EXIT_STATUS" != "0" ]; then \
       curl -fsS --max-time 10 --retry 3 \
         https://cronjobpro.com/ping/YOUR_TOKEN_HERE/fail; \
     fi'

[Install]
WantedBy=multi-user.target


# /etc/systemd/system/mybackup.timer
[Unit]
Description=Run mybackup daily at 02:30

[Timer]
# Verify this with: systemd-analyze calendar "Mon..Sun 02:30:00"
OnCalendar=Mon..Sun 02:30:00
# Catch up if the machine was off at the scheduled time
Persistent=true

[Install]
WantedBy=timers.target


# After editing, reload and enable:
# systemctl daemon-reload
# systemctl enable --now mybackup.timer
# systemctl list-timers --all | grep mybackup
# systemd-analyze verify /etc/systemd/system/mybackup.timer

Frequently asked questions

Does systemd send me an email when a timer-triggered service fails?

Not by default. systemd records the failure in the journal and changes the service state to failed, but it does not send any notification on its own. You can configure an OnFailure= directive in the [Unit] section that calls a separate notify service, but this only catches service-level failures, not cases where the timer itself was disabled or never fired. An external heartbeat monitor catches all failure modes including the ones where the service never starts.

What is the difference between a disabled timer and a failed service?

A disabled timer means the .timer unit is not loaded into memory and will not fire on schedule. The service it would have activated never runs. A failed service means the .timer is running normally and fires on schedule, but when systemd tries to start the .service unit it finds it in a failed state from a previous run and skips activation until you run systemctl reset-failed. Both look like a missed job from the outside, but they require different fixes.

How do I check when a systemd timer last ran and when it will run next?

Run systemctl list-timers --all. The LAST and PASSED columns show the most recent activation time and how long ago it was. The NEXT and LEFT columns show the upcoming scheduled time. If NEXT is blank or n/a, the timer has no valid OnCalendar or OnBootSec directive, usually because of a parse error. Run systemd-analyze calendar "your-expression" to validate the expression before putting it in the unit file.

Does Persistent=true in a timer prevent missed runs?

Persistent=true makes systemd store the last trigger time on disk and immediately activate the service if the scheduled time was missed while the timer was inactive (for example, while the machine was off). It helps with catch-up but does not help if the timer itself was disabled, if the service is in a failed state, or if the job ran but produced wrong results. You still need an external heartbeat to confirm that the job both ran and succeeded.

Why does my systemd service fail with 'command not found' when it works fine in the terminal?

systemd services run with a minimal, sanitized environment. The PATH variable does not include user-specific directories like /usr/local/bin or directories added by shell profile scripts. Add an explicit Environment=PATH=/usr/local/bin:/usr/bin:/bin line in the [Service] section, or use the full absolute path in ExecStart. You can inspect the exact environment a service sees by running systemctl show myservice.service | grep Environment.

More monitoring guides

Catch silent failures in systemd timers

Add one HTTP ping and CronJobPro alerts you the moment a run is missed or fails.

Monitor systemd Timers and Catch Silent Failures | CronJobPro