What Caused Amazon’s AWS Outage, and Why Did So Many Major Apps Go Offline?

Published on October 21, 2025 by @mritxperts

On October 20, 2025, the internet experienced one of its most significant disruptions in recent years when Amazon Web Services (AWS) suffered a major outage that took down countless websites, applications, and services worldwide. From banking apps to gaming platforms, millions of users found themselves unable to access essential digital services. This comprehensive guide explains what happened, why it affected so many platforms, and what it reveals about our digital infrastructure.

Understanding Amazon Web Services: The Backbone of the Internet

Before diving into what went wrong, it’s crucial to understand what AWS actually does and why one company’s problems can cascade across the entire internet.

What is AWS?

Amazon Web Services is the world’s largest cloud computing platform, holding approximately 30% of the global cloud infrastructure market. Think of AWS as the invisible scaffolding supporting much of the internet. Rather than building and maintaining their own expensive data centers filled with servers and hardware, companies rent computing resources, storage, and database services from AWS.

This business model emerged from Amazon’s own needs. During the company’s early days, Amazon needed massive server capacity to handle holiday shopping rushes. Engineers realized that during quieter periods, they could rent out this excess capacity to other businesses. From this pragmatic solution, AWS was born, and it has since become one of Amazon’s most profitable divisions, generating $107 billion in revenue during the 2024 financial year—17% of Amazon’s total revenue.

Services That Power the Digital World

AWS offers numerous services that companies depend on daily:

  • DynamoDB: A database service that stores critical information for applications, including customer data, user preferences, and transaction records
  • EC2 (Elastic Compute Cloud): Virtual servers that companies use to build and run their online applications
  • DNS Services: The system that translates user-friendly web addresses into IP addresses that computers can understand
  • Storage Solutions: Cloud-based storage that eliminates the need for physical servers
  • Auto-scaling Capabilities: Automatic adjustment of computing resources to handle traffic fluctuations

The October 2025 Outage: A Timeline of Events

When It Started

The problems began in the early morning hours of October 20, 2025, at approximately 3:11 AM Eastern Time. AWS acknowledged “increased error rates and latencies for multiple AWS services” in the US-EAST-1 region—data centers located in Northern Virginia, which represents the largest cluster of data centers in the United States.

The Root Cause: DNS Resolution Problems

By 5:01 AM Eastern Time, AWS engineers identified the culprit: a DNS resolution issue affecting DynamoDB API endpoints. The Domain Name System (DNS) acts like a phone book for the internet, converting user-friendly addresses like “amazon.com” into numerical IP addresses that computers use to communicate.

According to Amazon’s official statement, they “identified the trigger of the event as DNS resolution issues for the regional DynamoDB service endpoints.” In simpler terms, the problem wasn’t with the data itself—Amazon still had all customer information safely stored. However, other services couldn’t locate or access that data because the “address book” had malfunctioned.

One cybersecurity expert aptly described it as “large portions of the internet suffered temporary amnesia.” Companies’ applications were suddenly unable to find the databases they needed to function, even though those databases were intact and waiting.

The Complication: A Software Update Gone Wrong

Further investigation revealed that an early-morning software update to DynamoDB contained an error. This seemingly small mistake triggered a chain reaction of service failures. When AWS attempted to fix the initial DNS problem, they encountered additional complications, prolonging the outage.

The underlying DNS issue was reportedly mitigated by 2:24 AM Pacific Time (5:24 AM Eastern Time), but problems persisted. Some customers continued experiencing increased error rates with AWS services due to difficulties launching new EC2 instances—the virtual servers many applications rely upon.

Resolution

Amazon announced that it had resolved the “increased error rates and latencies for AWS Services” by 6:53 PM Eastern Time on October 20. However, the path to recovery was bumpy, with services experiencing intermittent disruptions throughout the day as AWS worked through a backlog of issues.

The Widespread Impact: Which Services Went Down?

The scope of this outage was staggering, affecting diverse industries and millions of users worldwide. Here’s a breakdown of the major services that experienced disruptions:

Social Media and Communication

  • Snapchat: One of the hardest hit, with over 22,000 outage reports at its peak
  • Reddit: The platform went completely offline for users, with reports peaking at 12,000
  • Signal: Secure messaging services were disrupted
  • Tinder: Dating app users couldn’t access their accounts

Gaming Platforms

  • Fortnite: Epic Games’ flagship title became unplayable
  • Roblox: The popular gaming platform went offline
  • Clash of Clans and Clash Royale: Mobile gaming giants were affected
  • Pokémon Go: Players couldn’t access the augmented reality game
  • Dead by Daylight, VRChat, and Rainbow Six Siege: Multiple other gaming services

Financial Services

  • Venmo: Mobile payment processing was disrupted
  • Coinbase: Cryptocurrency exchange experienced outages, though they assured customers “all funds are safe”
  • Chime and Robinhood: Banking and investment apps were affected
  • Major UK banks: Multiple financial institutions experienced service disruptions

Productivity and Education

  • Canva: The popular design platform went offline
  • Duolingo: Language learners couldn’t access lessons
  • Canvas by Instructure: Educational platforms faced disruptions
  • Slack: Workplace communication tools were affected

Entertainment and Media

  • Amazon Prime Video: Amazon’s own streaming service experienced problems
  • Amazon Music: Music streaming was disrupted
  • The New York Times: News website access was affected, putting Wordle streaks at risk
  • Disney+: Streaming services faced intermittent outages
  • Crunchyroll: Anime streaming was interrupted

Amazon’s Own Ecosystem

  • Amazon.com: The main shopping platform experienced significant issues
  • Alexa: Voice assistant functionality was impaired
  • Ring: Doorbell cameras and security devices stopped working properly
  • Amazon subsidiaries: Multiple Amazon-owned services were affected

Other Services

  • McDonald’s app: Mobile ordering was disrupted
  • Starbucks app: Coffee ordering faced problems
  • Airlines and booking sites: Travel services experienced issues
  • HMRC: The UK’s tax authority website went down
  • Vodafone, SFR, and Free: Telecommunications services in Europe were affected

Why Did One Problem Cause Such Widespread Chaos?

The October 2025 AWS outage perfectly illustrates a fundamental vulnerability in modern internet infrastructure: centralization and interdependence.

The Centralization Problem

When a single company controls 30% of the global cloud infrastructure market, problems at that company automatically affect a massive portion of the internet. AWS hosts thousands of companies’ data and applications, so when AWS experiences issues, those problems multiply across every service depending on it.

As one technology expert explained, AWS “sits in the middle of everything.” Companies choose AWS because building and maintaining their own data centers would be far more expensive and complex. However, this cost-effectiveness comes with a hidden trade-off: vulnerability to single points of failure.

The Cascade Effect

Modern web services are deeply interconnected. When DynamoDB—the database service—experienced DNS resolution problems, applications couldn’t access the data they needed to function. This created a domino effect:

  1. Applications lost connection to their databases
  2. Users received error messages when trying to access services
  3. Companies’ backup systems and failover mechanisms struggled to compensate
  4. Even after the initial DNS problem was fixed, new EC2 instances couldn’t launch properly, preventing services from fully recovering

Geographic Concentration

The outage primarily affected the US-EAST-1 region in Northern Virginia, where Amazon has invested more than $50 billion in data centers. Many companies host their primary operations in this region due to its reliability, cost-effectiveness, and proximity to major East Coast markets. When this region experienced problems, a disproportionate number of services were affected simultaneously.

How This Compares to Previous Outages

This wasn’t AWS’s first major disruption, nor was it the internet’s only recent large-scale failure.

AWS’s Track Record

AWS generally maintains robust uptime, which makes major outages like this one particularly notable. Previous significant AWS outages include:

  • 2021: A severe disruption affected websites and services globally, briefly bringing Amazon’s own delivery operations to a standstill
  • 2023: Another outage knocked many websites offline for several hours

Cybersecurity experts note that AWS’s performance is actually “on par with the other major cloud providers and, in fact, it’s amazing that they’re able to run at the scale they do without more frequent disruptions.” However, when disruptions do occur, their impact is massive due to AWS’s market dominance.

The CrowdStrike Comparison

In July 2024, the world witnessed an even more devastating IT failure when cybersecurity firm CrowdStrike pushed a faulty software update that crashed Microsoft Windows systems globally. This incident:

  • Caused approximately $5 billion in direct business losses
  • Grounded thousands of flights
  • Disrupted hospitals and banks worldwide
  • Affected government offices and essential services

The CrowdStrike outage and the AWS disruption share important similarities: both stemmed from software updates containing errors, and both revealed how problems with a single technology provider can have catastrophic global consequences.

AT&T Network Failures

In 2024, AT&T’s network experienced multiple outages, including an 11-hour meltdown that prevented many gig workers from performing their jobs. These incidents further highlighted the fragility of the digital infrastructure that modern life depends upon.

The Broader Implications: What This Means for Digital Infrastructure

Economic Vulnerability

For major businesses, cloud downtime translates directly into millions of dollars in lost productivity and revenue. One cyber insurance expert pointed out a critical gap: “Many policies don’t trigger unless an outage lasts eight hours or more,” yet even shorter disruptions can cause severe financial damage.

The AWS outage affected businesses across every sector:

  • E-commerce companies lost sales during peak hours
  • Gaming companies saw players unable to access paid content
  • Financial services couldn’t process transactions
  • Educational platforms disrupted learning schedules

Security Concerns

Beyond immediate financial losses, the outage raised serious security questions. The executive director of the Future of Technology Institute warned that “Europe’s dependency on monopoly cloud companies like Amazon is a security vulnerability and an economic threat we can’t ignore.”

This concern isn’t limited to Europe. Any concentration of critical infrastructure in the hands of a few companies creates potential vulnerabilities that could be exploited accidentally (as in this case) or maliciously by bad actors.

The Single Point of Failure Dilemma

One technology expert expressed ongoing confusion about the situation: “I continue to be confused about why there is not instant redundancy when you can have something that is seemingly so small, or so localized, have such a cascading and global impact.”

This question gets to the heart of the problem. While AWS and other cloud providers do build redundancy into their systems, the complexity of modern internet infrastructure means that even with backup systems in place, a single software error in a critical component can still cause widespread failures.

The Complexity Challenge

The internet functions as a complex web of overlapping services, and it’s only as reliable as its weakest code. When systems become overloaded or when a key network component fails, the impact spreads quickly—especially when so many services rely on the same underlying infrastructure.

Professor Mike Chapple from the University of Notre Dame’s Mendoza College of Business explained: “If a single company experiences an issue in their data center, it causes issues for that company’s products and services.” But when AWS experiences issues, thousands of companies—and millions of their customers—feel the impact simultaneously.

Lessons Learned and Future Considerations

For Businesses

The October 2025 AWS outage offers several critical lessons for companies relying on cloud infrastructure:

  1. Diversification is essential: Depending entirely on a single cloud provider creates unacceptable risk
  2. Multi-region deployment: Hosting services across multiple geographic regions can provide redundancy
  3. Backup and recovery plans: Companies need robust contingency plans for cloud service disruptions
  4. Insurance coverage: Review cyber insurance policies to ensure adequate coverage for shorter outages

For Consumers

Individual users also learned valuable lessons:

  1. Save important information offline: Cloud services aren’t infallible
  2. Know alternative ways to connect: Have backup communication methods
  3. Keep local copies: Don’t rely exclusively on cloud storage for critical data
  4. Have offline payment options: Digital payment systems can fail

For the Industry

The broader technology industry faces important questions:

  1. Is the current level of market concentration sustainable?
  2. Do we need more stringent reliability requirements for critical infrastructure providers?
  3. Should governments encourage or require companies to diversify their cloud service providers?
  4. How can the industry build more robust failover systems?

The Role of Regional Infrastructure

The concentration of AWS’s US-EAST-1 region in Northern Virginia played a significant role in this outage’s impact. This region serves as a critical hub for global internet traffic, making it simultaneously invaluable and vulnerable.

Why Virginia?

Northern Virginia became the internet’s backbone for several reasons:

  • Proximity to government and military installations (driving early internet infrastructure development)
  • Abundant power supply
  • Tax incentives for data center construction
  • Existing telecommunications infrastructure
  • Strategic location on the East Coast

Amazon’s $50 billion investment in Virginia data centers reflects the region’s importance. However, this geographic concentration means that regional problems—whether technical issues, natural disasters, or other disruptions—can have outsized global effects.

Technical Deep Dive: Understanding DNS and DynamoDB

For those interested in the technical details, understanding how DNS and DynamoDB work helps explain why their disruption was so impactful.

Domain Name System (DNS)

DNS converts human-readable domain names into numerical IP addresses. When you type “amazon.com” into your browser:

  1. Your computer asks a DNS server for Amazon’s IP address
  2. The DNS server responds with the numeric address (e.g., 192.0.2.1)
  3. Your computer then connects to that IP address
  4. The website loads

When DNS fails, computers can’t find the servers they need to connect to, even if those servers are functioning perfectly. This is exactly what happened during the AWS outage—applications couldn’t locate the DynamoDB databases they needed because the DNS “address book” was malfunctioning.

DynamoDB’s Critical Role

DynamoDB serves as a database service hosting information for thousands of companies. Applications constantly query DynamoDB to:

  • Retrieve user account information
  • Process transactions
  • Access stored content
  • Verify permissions
  • Load application settings

When applications lost the ability to reach DynamoDB due to DNS problems, they couldn’t perform these essential functions, rendering them effectively useless to end-users.

What AWS Is Doing to Prevent Future Outages

While Amazon hasn’t released detailed plans specific to preventing similar DNS resolution issues, the company’s general approach to reliability includes:

Continuous Monitoring

AWS maintains sophisticated monitoring systems designed to detect problems quickly. In this case, they identified the DNS issue within approximately two hours of its occurrence.

Regional Redundancy

AWS operates data centers in multiple regions worldwide, allowing companies to distribute their services geographically. However, many companies still concentrate operations in single regions for cost and performance reasons.

Automated Failover Systems

AWS provides tools for automatic failover to backup systems when primary services fail. However, the speed and effectiveness of these systems depend on how individual companies configure them.

Regular Testing

AWS conducts chaos engineering exercises—deliberately causing failures in controlled environments to test system resilience and identify weaknesses before they cause real-world problems.

The Competitive Landscape: AWS vs. Other Cloud Providers

Understanding AWS’s market position helps explain why its outages have such widespread impact:

Market Share

  • AWS: 30-37% of global cloud market (depending on the measurement)
  • Microsoft Azure: Second-largest provider
  • Google Cloud: Third-largest provider

The Consolidation Challenge

Some experts argue that this concentration of cloud services among three major providers creates systemic risk. If one of the “big three” experiences problems, a massive portion of internet services are affected.

Interestingly, Microsoft Azure experienced its own major outage earlier in October 2025, prompting Google to capitalize on the service lapse by pitching its own tools and business continuity plans to potential customers. This competitive dynamic may eventually drive improvements in reliability as providers work to differentiate themselves.

Practical Steps: What You Can Do

For Individuals

  1. Maintain offline copies of important documents and data
  2. Use multiple payment methods, including cash and cards from different providers
  3. Don’t rely solely on cloud-connected devices for critical functions (like home security)
  4. Have backup communication options (multiple messaging apps, phone numbers from different carriers)

For Small Businesses

  1. Implement multi-cloud strategies where feasible
  2. Deploy services across multiple AWS regions at minimum
  3. Maintain local backups of critical data
  4. Develop and test disaster recovery plans
  5. Consider hybrid cloud approaches combining cloud and local infrastructure

For Enterprise Organizations

  1. Conduct regular resilience testing including simulated cloud provider outages
  2. Implement sophisticated failover systems across multiple providers and regions
  3. Review and update insurance coverage for cloud service disruptions
  4. Develop comprehensive business continuity plans that account for infrastructure failures
  5. Consider sovereign cloud options for critical operations in regions with such requirements

The Future of Cloud Infrastructure

The October 2025 AWS outage will likely drive several industry trends:

Increased Diversification

More companies may adopt multi-cloud strategies, distributing their services across AWS, Azure, and Google Cloud to reduce dependence on any single provider.

Edge Computing Growth

Moving computing resources closer to end-users (edge computing) can reduce dependence on centralized data centers, though this approach brings its own complexity.

Regulatory Scrutiny

Governments may increase oversight of cloud infrastructure providers, potentially mandating redundancy requirements or reliability standards for critical services.

Insurance Innovation

The cyber insurance industry is experimenting with parametric options that would trigger payouts based on specific conditions (like outage duration) rather than requiring eight-hour thresholds.

European Digital Sovereignty

European nations are already moving to reduce dependence on American cloud providers, with initiatives to develop and support local alternatives gaining momentum.

Conclusion: Living with Digital Fragility

The October 2025 AWS outage served as a stark reminder that the internet’s apparent reliability masks significant underlying fragility. As our personal and professional lives become increasingly dependent on cloud services, single points of failure carry ever-greater consequences.

The incident affected millions of users across virtually every sector of the digital economy, from gaming and social media to banking and education. While AWS resolved the issues within approximately 16 hours, the disruption highlighted critical vulnerabilities in our digital infrastructure.

Several key takeaways emerge from this event:

  1. No system is infallible: Even the world’s largest and most sophisticated cloud provider can experience major disruptions
  2. Interconnection amplifies impact: The internet’s interconnected nature means single failures can cascade globally
  3. Diversification matters: Both companies and individuals need backup plans and redundant systems
  4. The status quo may not be sustainable: Current levels of market concentration create systemic risks that may require industry or regulatory responses

As we continue building an increasingly digital future, the challenge will be maintaining the cost-effectiveness and convenience of centralized cloud services while building in sufficient redundancy and resilience to prevent—or at least minimize—the impact of inevitable future failures.

For now, the AWS outage of October 2025 joins events like the CrowdStrike incident as a case study in digital infrastructure vulnerability—a wake-up call that reminds us to prepare for disruptions even as we depend ever more heavily on the cloud services that power our modern world.


This article was last updated on October 21, 2025, with information about the resolution of the AWS outage and its broader implications.