What is Uptime Monitoring?

· Testomato

In June 2025, cloud platform as a service provider Heroku experienced a major system outage for nearly 24 hours, leaving thousands of sites and applications that depended on the hosting service offline, with almost no information provided about the disruption.

It took their team about 8 hours to find the root cause of the problem and after restoring their service and analyzing the events and systems involved, Heroku provided a summary of the outage to explain the incident.

Among the critical issues they discovered, one particularly interesting one was a weakness in their infrastructure design. According to their summary, their “internal tools and the Heroku Status Page were running on this same affected infrastructure. This meant that as your applications failed, our ability to respond and communicate with you was also severely impaired.”

More specifically, their monitoring and incident tools ran on the same infrastructure as production hosts for client applications. This coupling in the design of their systems impacted their responsiveness to the incident, making it harder to detect, investigate, and resolve. It also caused harm to the trust in their brand since they were unable to communicate in a timely way with their affected customers.

This incident had a widespread impact on many other organizations because of the scale and nature of Heroku’s business as a hosting platform, but a service outage can impact any site or application, and without monitoring, you won’t know about it until you happen to check manually or receive complaints from your users or customers.

A lot of time, money, and trust can be lost between the moment the outage begins and when your team knows about it. Uptime monitoring helps to close this gap.

Uptime and Downtime: What Do They Actually Mean?

Uptime: The percentage of time a service is available and responding correctly to requests. All systems are green.

Downtime: Any period when a service is unavailable or failing to respond as expected. This includes a spectrum of states:

StateWhat it means
Full outageService is completely unavailable — HTTP 500, timeout, or no response
Partial outageSome endpoints or features fail while others work — homepage loads but sign-up returns 500
Degraded performanceService responds but is degraded — slow response times, stale data, or incomplete functionality

Different organizations may categorize downtime statuses differently, but most incidents fall somewhere within this range of states.

A site can technically be “up” (the server responds with 200) while core functionality is broken. This is why uptime monitoring has limits, which we explore further in Beyond Uptime: When the Site is Up but Something is Wrong.

What is Uptime Monitoring?

Uptime monitoring is an automated system that continuously sends requests to your URLs, verifies that they respond correctly, and alerts you when they don’t. Uptime monitoring answers one crucial question on your behalf, 24/7: “Is this URL responding the way it should?

Having an automated system monitor the status of your site replaces the infeasible task of manually checking whether your site is up. Not only is it an issue of scale - the number of pages and features to keep track of can be overwhelming, even for a seemingly small and simple site - but it is also one of time and location. Your site could go down at 4am in a specific region but be running normally in another.

Testomato’s uptime monitoring continuously checks your site’s availability and alerts you instantly the moment downtime is detected, allowing you to respond before users or customers even notice there was a problem.

How Does Uptime Monitoring Work?

Uptime monitoring works by sending HTTP requests to specified URLs (like a browser visiting a page, but automated). The monitoring tool checks the responses to these requests against what is expected: 200 means OK, 5xx means server error, 4xx means not found or forbidden, timeout means no response at all.

Automated uptime monitoring software can also record the response time, i.e., how long the server took to respond to the request. This information can help to surface degraded performance that might not trigger a status code alert, because the page is technically reachable, but is still negatively impacting users.

When monitoring your site’s uptime, the frequency of checks is one of the most significant benefits of using an automated tool. The monitoring interval is also one of the factors that can help you to select the right tool for your needs. Most tools check uptime once per minute. Testomato’s uptime monitoring interval is 15 seconds - 4x faster than the average tool. If your tool checks every minute and your site goes down 1 second after a check, you won’t know for up to 59 seconds. At $5,600–$9,000 per minute in average downtime costs, the math speaks for itself.

You can observe and protect your uptime even further if your monitoring service offers multi-location monitoring.

Each Testomato check runs from a primary monitoring location, with a nearby backup location. If the primary detects a potential outage, the backup automatically retests before any alert is sent. You’re only notified if both locations confirm the failure, which filters out false alarms from localized network disruptions, ISP issues, or temporary routing problems that affect only one region.

The Cost of Downtime

The most significant impacts from downtime include financial losses, erosion of user trust, and drops in SEO ranking. The potential damage is not just limited to sales and brand equity, but can extend into other hard to quantify areas, like decreased productivity within your organization and additional operational costs in overtime, fixing and improving any systems that failed, and any long-tail recovery and remedies that your organization may need to engage in.

Uptime monitoring doesn’t necessarily prevent incidents directly, but it does help you to be faster and more responsive when an incident occurs. Keeping track of long-term patterns in your uptime and downtime through uptime monitoring also helps you to find vulnerabilities and opportunities to improve your existing systems, reducing the frequency and severity of costly downtime incidents. After the fact, your post-mortem processes benefit from the data collected through your uptime monitoring so that you can improve your own systems and be even faster and more resilient to incidents in the future.

Financial Impact

According to Atlassian, downtime costs average $5,600–$9,000 per minute for medium and large businesses, depending on size, industry, and business model. Small businesses lose between $137 and $427 per minute. Business disruption, which can include things like damage to reputation, constitutes the largest share of downtime costs, while revenue and user productivity losses make up the next two major costs of downtime incidents.

Since downtime exists on a spectrum ranging from total outage to partial failures or degraded performance, it is also worth noting how slower response times can contribute to downtime costs. A site that loads in 1 second has an e-commerce conversion rate 2.5x higher than one that loads in 5 seconds. Downtime isn’t binary, so slow response times or other delays caused by degraded service cost conversions as well.

The more your business or organization relies on continual uptime, the more it will be impacted by downtime incidents. Data center failures are becoming less frequent and less severe relative to the rapid growth of digital infrastructure, but when they do occur, the financial and reputational cost per incident is higher than ever.

Although you cannot control the downtime of platforms that your organization may rely on, uptime monitoring can arm you with data to be better prepared and to communicate with your users during an incident. Uptime monitoring can also enable you to recover losses from those providers if downtime has violated your SLA (Service Level Agreement) with them.

User Trust

Downtime is immediately harmful for first-time visitors because there is no goodwill buffer like you may have for existing and loyal users. If your service is unavailable for too long, they will likely leave and never return. Nevertheless, even existing users expect continual uptime, so their trust can erode when downtime incidents interrupt their use of your site. In a survey of consumers about online trust, Queue-it found that 64% of consumers are less likely to trust a business after experiencing a website crash.

As the Heroku incident demonstrates, the speed of response and the transparency that you offer during an incident can soften the blow of a downtime incident. Your users and customers want to know what is happening and are more willing to extend goodwill when you are able to clearly and promptly communicate any issues to them.

SEO Rankings

When Googlebot repeatedly encounters errors on a site or cannot reach it at all, it reduces crawl frequency and may deindex those pages. Brief outages generally have minimal impact on the SEO ranking, but lengthy and repeated downtime can have real ranking consequences. See how HTTP errors affect Google Search for more detail.

No organization is immune to downtime incidents - from the largest cloud providers to the Mom and Pop online loose leaf tea shop. But the magnitude of damage that is done to your business or organization can vary widely depending on how quickly you learn about an incident and whether the source of that knowledge is your own monitoring tools or coming from user complaints. Uptime monitoring tools like Testomato cannot completely prevent downtime, but they can significantly reduce how much damage an incident can do. Uptime monitoring is the foundational layer of protection for your online operations.

What Do Uptime Percentages Actually Mean?

Uptime is expressed as a percentage of time that a service is available. This percentage can be defined within a Service Level Agreement (SLA) — a formal contract between a service provider and a customer that defines the expected levels of service, including the guaranteed uptime percentages, response times, and the remedies owed if the guarantees are not met.

SLAs are often expressed as how many nines they contain in the percentage, from the seemingly reassuring two nines of 99% to the extremely strict, five nines of 99.999%. Despite this apparently small difference in tenths, hundredths, or thousandths of a percent, the real difference between the number of nines in an uptime percentage is enormous when translated to real time.

UptimeDowntime/yearDowntime/month
99%~87 hours~7.3 hours
99.9%~8.75 hours~43 minutes
99.99%~52 minutes~4.4 minutes
99.999%~5 minutes~26 seconds

Keep in mind that these numbers of downtime per month are the maximum allowed according to the SLA. With data from your uptime monitoring tool, you can check exactly how much downtime your site is experiencing every month.

The difference in downtime between 99.9% and 99.99% is the difference between 43 minutes of downtime per month and 4 minutes. You can use these figures to calculate the potential cost of downtime for your organization.

For example, if we use Atlassian’s recommended calculation cost of $427 per minute of downtime for a small business with 99.9% uptime, it can cost such a business about $18,361 per month. If you are a larger organization, the average cost of $9,000 per minute of downtime with 99.9% uptime can be as high as $387,000 each month.

Tip: You can calculate how much downtime corresponds to your SLA with uptime.is

Alerts and Avoiding Alert Fatigue

When there is a downtime incident, your uptime monitoring tool should be set up to alert you. You can route certain alerts to different channels:

  • Email — reliable, async, good for individuals or small teams
  • Slack (and similar) — keeps the whole team informed in real time without switching tools
  • PagerDuty / on-call integrations — for critical infrastructure, where someone needs to be woken up
  • Pushover, Pushbullet — push notifications to mobile

Testomato integrates with all of these tools and also allows you to specify the incident duration threshold, helping you reduce notifications about short-lived incidents so you can prioritize more severe ones.

Your alerts should filter noisy, low-impact incidents from critical ones. Not all incidents warrant a 3am phone call.

Alert Fatigue

Alert fatigue happens when too many notifications cause teams to start ignoring them. This defeats the purpose of monitoring. Some common causes include overly sensitive thresholds, alerting on every single drop or spike in latency, or inadequate severity distinction.

You can make sure you have a reliable alerting policy with a few simple practices:

  • Not everything should ping someone at 3am: define severity levels
  • Alerts should be actionable: include what failed, from where, and suggested next steps
  • Deduplication: combine repeated identical alerts rather than sending 20 identical emails
  • Review and tune thresholds regularly

Not all downtime is unexpected. Deployments, database migrations, and infrastructure work cause planned downtime that your team already knows about. Your uptime monitoring tool should be made aware of these windows as well, so that you can suppress the alerts during such periods, or else it will fire alerts for scheduled downtime and increase distracting notification noise for your team.

Planned downtime is also something that you can communicate to users in advance. Your uptime monitoring tool is meant for catching unexpected outages, not planned maintenance.

A good uptime monitoring tool paired with a thoughtful alerting configuration ensures that the right alert gets to the right person, with the right context, at the right time.

Beyond Uptime: When the Site is Up but Something is Wrong

Uptime monitoring answers one important question: is the server responding? However, it doesn’t verify whether the response is correct or whether the site is actually usable.

A site can return HTTP 200, passing the uptime check, while serving a blank page or a faulty checkout flow. If your site is consistently frustrating to use because of partial or degraded service, this can also cause users to navigate away.

Many tools, including Testomato, allow optional keyword checks to verify that specific text is present in the response body (e.g., “Add to cart”). These checks help to catch cases where the server responds with unexpected content (like incorrect text or missing images). This is what Testomato calls website monitoring. Although these cases do not cause downtime, they can still interfere with the user experience.

APIs and Third-Party Dependencies

The same HTTP check mechanism that monitors your site can also monitor API endpoints, including ones that you do not control. Many sites rely on third-party tools and resources - payment processors, authentication providers, email delivery, CDNs - to provide their full range of services and features. While your site may not be directly experiencing a downtime incident, one of your dependencies may be and your own site could experience disruption as a result.

From a user perspective it doesn’t matter where the disruption originates; they just experience your product or website as broken. For this reason, it can be worth monitoring not only your own website and APIs, but also critical third-party integrations whose failure would break your user-facing functionality.

With Testomato, it is easy to monitor the uptime of other sites and services so you can keep an eye on the ones you rely on too. Their uptime impacts yours. Although monitoring the uptime of your site alone may catch the downstream effects of downtime in one of your third-party dependencies, monitoring them directly can help you respond faster to harmful incidents.

Conclusion

Uptime monitoring is the baseline for modern online operations. It is the minimum that you should have in place for any site or service that matters to your business or organization. Downtime will happen. Uptime monitoring determines how fast you’ll know when something goes down.

Start Uptime Monitoring for Free →

Back to Blog