Phoenix Data Center Hardware Incident – What Happened and How We’re Improving

Sep 04, 2025
•
2 min read

On August 30, 2025, Rocket.net experienced an incident in our Phoenix data center. Services were restored as quickly as possible, but we learned important lessons about improving our communication during outages.

What Happened

At 9:39pm EDT on August 30, a hardware failure affected a single host node (phx24) in our Phoenix data center. This caused downtime for approximately 405 production and staging sites on the Rocket.net platform, lasting up to four hours. The issue was traced to a chassis-level power and control fault. Working with our partner Leaseweb, our team performed a full chassis replacement (disks preserved) and resolved a secondary iDRAC networking issue that surfaced after the swap.

The node was restored at 1:49 am EDT on August 31. All other nodes and customer sites across our network remained stable and performant.

Timeline of Events

Incident Starts: Hardware failure occurs at 9:39 pm EDT on August 30, 2025
Resolution: Chassis replaced, iDRAC issue resolved, full service restored at 1:49 am EDT
Status Page Update: Posted at 2:07 am EDT on August 31, 2025

Overview

The affected node was operated by Leaseweb, our infrastructure provider in Phoenix. We were alerted within 10 seconds of when the failure occurred, and our engineers (including our founder) worked through the night on a live bridge call with Leaseweb to restore service.

Unfortunately, our public status page updates fell behind in the urgency of resolving the issue. Customers relying on the status page did not see timely updates.

We are always trying to improve. There are two separate issues we will address here for the future:

First, going forward, we will automate the status page to update in real time. We have recently joined forces with hosting.com and will soon be able to leverage the hosting.com monitoring teams to add extra cover as we move from startup to global scale.

Second, while server issues do happen, four hours of downtime does not meet what we internally refer to as the “Rocket Standard.” We have asked our partner Leaseweb for a one-hour SLA on hardware replacements — they currently provide us with a four-hour SLA on any hardware failure.

Furthermore, as we continue with our hosting.com partnership, we will have more control over our servers by deploying our own hardware in our own datacenters. We will back up our commitment to the Rocket Standard by providing an uptime SLA as standard with all of our plans.

Committed to Transparency

Incidents happen, but silence should not. We regret the delay in communication as much as the incident itself. With the new safeguards, you’ll see status page updates as soon as issues arise — even while we’re working on resolution.

Our commitment is simple: when you choose Rocket.net, you can count on both rapid recovery and clear, honest communication every time.

More papers