Mews Down - Azure SQL Database Outage
Incident Report for Mews
Postmortem

Problem

On 2022-11-10 22:15:00 UTC the Mews application started displaying the message: "Oops, something went wrong." and was unable to load. The system was unavailable for 20 minutes.

Action

After identifying that the root cause was the unavailability of our primary database, a failover to the secondary infrastructure, in place for such scenarios, was initiated. The failover got the system back up and running at 22:35 UTC with slightly higher latency, caused by extra network traffic to the secondary region.

On the following morning, 2022-11-11 at 9:13 UTC, we performed the failover back to the primary region. We soon noticed an elevated error rate, caused by a stale database connection in part of the system, causing a delay in part of the background processing. We restarted the affected application and the delayed items were reprocessed.

Causes

Azure SQL Database service outage.

Solutions

We will further improve the monitoring of our platform to detect such issues even faster. Also, we will introduce further automations to speed up the process of doing failover to the secondary region.

Posted Nov 16, 2022 - 15:09 CET

Resolved
The issue is resolved. We apologize for any inconvenience it may have caused.
Posted Nov 11, 2022 - 00:16 CET
Monitoring
We experienced a database outage on the side of our cloud provider. We performed failover to a secondary region and are monitoring the results.
Posted Nov 10, 2022 - 23:38 CET
Investigating
We are currently experiencing a higher than normal load on database, and may be causing pages in Mews application to be slow or unresponsive.

We are investigating the cause and will provide an update as soon as possible.
Posted Nov 10, 2022 - 23:24 CET
This incident affected: Operations, Guest Journey, Business Intelligence, Payments, Open API, and Marketplace.