For 11 minutes between 1:43 PM and 1:54 PM UTC on November 3, the PMS users and Open API clients observed general slowness and an elevated rate of timeouts.
The performance degradation resolved on its own as we started the investigation, prompted by an alert from our automatic monitoring system.
One of the database replicas became overwhelmed with compiling execution plans for the incoming queries. The degradation was triggered by a diagnostic query inspecting the replica’s performance data.
The database performance data catalog was made read-only as an immediate prevention measure.
We are designing a safer way to inspect database performance data, without any impact on the live workload.