99.95% Uptime

99.95% availability allows about 4 hr 22 min 59 sec of downtime per year. Here is the full breakdown — plus what it means for scheduled jobs.

Allowed downtime at 99.95%

PeriodAllowed downtimeIn seconds
Per day43 sec43
Per week5 min 2 sec302
Per month21 min 55 sec1,315
Per quarter1 hr 5 min 45 sec3,945
Per year4 hr 22 min 59 sec15,779

Missed scheduled runs at 99.95%

Downtime is not just lost availability — for scheduled jobs it means runs that never happen. At 99.95% uptime, here is roughly how many executions a cron job loses per year at common frequencies:

Cron frequencyMissed runs / year
Every minute263
Every 5 minutes53
Every 15 minutes18
Hourly4

What uptime percentages mean

Uptime percentage expresses the fraction of time a system is operational and reachable over a given period, typically measured monthly or annually. The informal "nines" naming convention counts the leading nines in the figure: one nine is 90%, two nines is 99%, three nines is 99.9%, four nines is 99.99%, and five nines is 99.999%. A critical gotcha for architects is composite availability: three independent services each running at 99.9% uptime, when chained together, produce a combined availability of roughly 99.7% because their failure probabilities multiply.

Error budget

An error budget is the complement of your SLA target: a service promising 99.9% availability has a 0.1% error budget, which works out to roughly 43 minutes of allowable downtime per month. Teams treat this budget as a finite resource — every confirmed outage, degraded-performance window, and failed deployment draws it down, and when it is exhausted before the period ends, further risky releases are typically paused. Tracking the remaining budget in real time aligns engineering and product priorities by making the cost of unreliability concrete and visible.

SLI vs. SLO vs. SLA

A Service Level Indicator (SLI) is the raw measurement — for example, the percentage of HTTP requests that return a successful response within a defined latency threshold over a rolling window. A Service Level Objective (SLO) is the internal target your team sets for that indicator, such as keeping successful-request rate above 99.9%; breaching an SLO triggers an internal response but carries no external consequence. A Service Level Agreement (SLA) is the contractual commitment made to customers, backed by defined remedies such as service credits or refunds when the agreed threshold is not met.

How to achieve 99.95% uptime

Target architecture: Advanced multi-AZ.

  • Spread infrastructure across multiple availability zones within a region so a data-center-level failure does not affect all instances simultaneously.
  • Implement health-check-based traffic routing so the load balancer removes unhealthy instances from the pool automatically within seconds.
  • Use blue-green or canary deployments to eliminate deployment windows as a source of downtime.
  • Instrument every dependency — databases, caches, queues, and third-party APIs — with circuit breakers and fallback paths.
  • Run regular chaos engineering exercises or game days to surface hidden single points of failure before they cause customer-facing incidents.

Monitoring your SLA — including silent failures

Traditional HTTP uptime checks confirm that a URL is reachable and returning expected responses, but they are blind to an entire class of failure: a cron job that simply never runs. If a scheduled task silently skips its execution window — due to a dead worker process, a misconfigured schedule, or a deployment that removed the job — no HTTP probe will fire an alert, yet your users are affected and the time lost counts against your error budget invisibly. The correct pattern for job-based systems is a heartbeat or dead-man's-switch check, where the job itself pings a monitoring endpoint on each successful run and an alert fires when that ping goes missing within the expected interval. CronJobPro combines both approaches — external HTTP polling and built-in heartbeat monitoring — so neither web-endpoint failures nor silent missed runs go undetected.

Frequently asked questions

How much downtime does 99.95% uptime allow?

99.95% uptime allows about 4 hr 22 min 59 sec of downtime per year, which is roughly 21 min 55 sec per month and 43 sec per day.

Does planned maintenance count against my SLA?

It depends on how the SLA is written. Many commercial SLAs explicitly exclude downtime that occurs during pre-announced maintenance windows, provided the vendor gives customers adequate advance notice — commonly 48 to 72 hours. However, some enterprise agreements treat all unavailability equally regardless of cause. Always read the exclusions section of any SLA carefully, and if you are drafting your own, define the maintenance window policy explicitly to avoid disputes.

How is an error budget calculated?

An error budget starts from your SLA target: subtract the target percentage from 100% to get the permitted failure percentage, then apply it to the time window you are measuring. For example, a 99.9% monthly SLA on a 30-day month yields 0.1% of 43,200 minutes, which is approximately 43 minutes of allowable downtime. Tracking cumulative downtime against that budget throughout the month lets teams make data-driven decisions about when to slow down risky changes.

What is a good uptime SLA for a SaaS product?

There is no universally correct answer because the right target depends on your customers' tolerance for downtime, your infrastructure investment, and your product's criticality. Consumer-facing SaaS products commonly commit to 99.9% (around 43 minutes of monthly downtime), while business-critical or enterprise tools often target 99.95% or 99.99%. Committing to a number higher than your current measured baseline is a liability, so it is better to set a target you can reliably exceed and raise it as your infrastructure matures.

How frequently should I run uptime checks to meet my SLA?

Check frequency determines how quickly an outage is detected, which directly affects your mean time to alert and the total undetected downtime that erodes your error budget. As a rule of thumb, your check interval should be significantly shorter than the downtime allowance for a single incident. A 99.9% monthly SLA permits roughly 43 minutes total, so a 5-minute check interval is a reasonable floor; a 99.99% SLA permits only about 4 minutes per month, making 1-minute or sub-minute checks necessary. Always combine polling intervals with alerting thresholds that account for transient failures before paging on-call staff.

What is the difference between availability and reliability?

Availability measures the proportion of time a system is in a working state and reachable by users — it is expressed as a percentage and is the basis for SLA calculations. Reliability is a broader concept that encompasses whether the system produces correct results consistently over time, including under load or adverse conditions. A system can be highly available but unreliable if it responds to requests but returns wrong data; conversely, a system with planned maintenance windows has lower availability but may be highly reliable during the time it is running. Good SLA design considers both dimensions rather than treating uptime percentage as the sole quality indicator.

Compare uptime levels

99.95% Uptime — Downtime Table & SLA | CronJobPro