Issue on Production
Incident Report for Kili

Post Mortem

Production issue 3 February 2023


On 3rd February 2023, we encountered a slowness issue on the platform.

That incident leads to delays during the annotation and review process

Incident Timing (UTC+1)

03/02/23  10:39 am =>  06/02 11:21 Am

Incident Timeline (UTC+1)


  • First alert or first ticket

    • Internally at 10:39 am
  • First investigations on long transaction query

  • First Announcement

    • Status page incident creation at 9:36 pm
  • Status page update time

    • 9:36 pm  Investigation in progress:  We are currently investigating this issue.


  • Status page update time

    • 04/02 03:10 pm  Situation under control: The issue has been identified and a fix is being implemented


  • Status page update time

    • 06/02 11:21 am  Incident resolved: This incident has been resolved.



Container crash identified. Finding the root cause.

End-User Impact

Global slowness on the platform could lead to delays during the annotation and review process.

What caused the incident?

A memory leak triggered to slow the container. That’s triggering kills by the orchestrator that triggers to none closed SQL transactions.

Corrective elements put in place to ensure that this does not happen again

We improved the monitoring and alerting

We improved the feature rollout

We decrease the memory consumption for big SQL queries

We start more containers

Posted Feb 08, 2023 - 10:26 UTC

This incident has been resolved.
Posted Feb 06, 2023 - 10:21 UTC
A fix has been implemented and we are monitoring the results.
Posted Feb 04, 2023 - 14:10 UTC
Dear all,

We are experiencing a performance issue on Kili.
You may encounter some slowness during your work.
We are mobilized to resolve this issue in the best delays.

We are deeply sorry for the inconvenience caused.

Kili Support team
Posted Feb 03, 2023 - 20:36 UTC
This incident affected: Europe (Kili API).