Inaccessible pages
Incident Report for Mews
Postmortem

Problem

On Wednesday, September 16 at 13:57 CEST we deployed a configuration change which changed the way request volume limits are evaluated. As a result of that, small number of users were throttled when working in Commander (3% of all requests had been throttled).

Action

On Thursday, September 17 at 04:55 CEST we were notified about the issue by the Customer Care team. We identified the configuration change as the cause and deployed a fix at 07:03, reverting the configuration change.

Causes

1. We only estimated the impact of the configuration change, without looking more closely at the actual request volume patterns.
2. We did not have alerts in place to notify us about excessive throttling of web requests.
3. Users were not clearly told by the application that they are throttled. Instead, a generic error message was served.

Solutions

1. Follow a more rigorous, data-driven procedure when deploying configuration changes with potentially a large impact to our customers.
2. Set up alerts on excessive throttling.
3. Let the user clearly know the request has been throttled.

Posted Sep 29, 2020 - 12:03 CEST

Resolved
Fix was successfully deployed and our monitoring shows that incident was resolved for all impacted users.
Posted Sep 17, 2020 - 07:34 CEST
Identified
Root cause was identified as an issue with maximum allowed rate of requests for each user. We are working on resolving the problem.
Posted Sep 17, 2020 - 06:35 CEST
Investigating
Some pages in Commander may be intermittently inaccessible to some users. We are currently investigating the issue.
Posted Sep 17, 2020 - 06:11 CEST
This incident affected: Operations.