Post Mortem
Production issue 05/12/2022 - 12/12/2022
On 5th December 2022 week we encountered a slowness issue on the platform
05/12 11:24 am to 9:24 pm => 12/12 3:20 pm to 5: 57 pm
First alert or first ticket
First Announcement
Status page update time
First alert or first ticket
First Announcement
Status page update time
First alert or first ticket
First Announcement
Status page update time
Indexes optimisations
Maintenance operation on our Database (increasing memory and creating indexes)
Vacuum the asset table to improve performance
Release a hotfix to fix tcp memory leak.
Global slow on the platform could lead to delays on Replica that lead to update issue and make very slow the access to the platform
Users can not access to the platform due to an issue with authentication
The creation by script of several million assets at the same time.
The load has been multiplied by 10
We supported the creation but the time that the platform scale, slowness was felt.
Heavy SQL queries have triggered some zombies connections triggering memory leak on the tcp kernel part, impacting all the k8s node.