Degraded performance
Incident Report for Mews


Between 09:30 and 09:41 UTC on September 29th, the application backend performance was degraded and some of the requests timed out.


We immediately scaled up the backend to compensate for 2 misbehaving instances. After a couple of minutes, the problem went away on its own.


Two of the backend instances crashed and after rebooting, requests were being routed to them even though they were not fully initialized yet. This made them respond to these requests slower, causing subsequent requests to be queued up rather than processed immediately. Also, a lot of the pressure put on these instances came from many websocket clients reconnecting at once.


Apart from a better crash monitoring, there are several solutions we will be implementing:

  • Have the load balancer forward requests only to fully initialized and healthy instances.
  • Optimizing the websocket client authentication flow.
  • Route application / API / websocket requests to different backends, increasing resiliency and making sure that an incident in one area does not impact the entire system.
Posted Oct 29, 2021 - 14:19 CEST

This incident has been resolved.
Posted Sep 29, 2021 - 12:59 CEST
The system is healthy again and the performance is back to normal. We have identified the root cause and we are monitoring the situation.
Posted Sep 29, 2021 - 12:21 CEST
We are currently experiencing a degraded performance of the system.
Posted Sep 29, 2021 - 11:46 CEST
This incident affected: Operations, Guest Journey, Business Intelligence, Payments, Open API, and Marketplace.