On March 26, 2025, our Point of Sale (POS) system experienced a major incident due to heavy load, which rendered the service unavailable to customers. The root cause was identified as insufficient Input/Output Operations Per Second (IOPS) allocated to the storage used by the POS's database.
Upon detection of the incident, the team promptly initiated recovery actions executing specific operations in AWS and on the database. The service was restored, and monitoring was put in place to ensure stability.
The incident was caused by a combination of factors:
To prevent similar incidents in the future, the following actions will be taken: