Post Mortem
Production issue 05/12/2022 - 12/12/2022
On 5th December 2022 week we encountered a slowness issue on the platform
05/12 11:24 am to 9:24 pm => 12/12 3:20 pm to 5: 57 pm
05/12/2022
First alert or first ticket
First Announcement
Status page update time
06/12/2022
First alert or first ticket
First Announcement
Status page update time
08/12/2022
First alert or first ticket
First Announcement
Status page update time
Actions
5/12/2022
Indexes optimisations
6/12/2022
Maintenance operation on our Database (increasing memory and creating indexes)
11/12/2022
Vacuum the asset table to improve performance
12/12/2022
Release a hotfix to fix tcp memory leak.
Global slow on the platform could lead to delays on Replica that lead to update issue and make very slow the access to the platform
Users can not access to the platform due to an issue with authentication
The creation by script of several million assets at the same time.
The load has been multiplied by 10
We supported the creation but the time that the platform scale, slowness was felt.
Heavy SQL queries have triggered some zombies connections triggering memory leak on the tcp kernel part, impacting all the k8s node.