Issue on Production instance - slowness issue
Incident Report for Kili
Postmortem

Production issue 24th July 2023

Summary

On 24th July  2023, we encountered a slowness issue on the platform.

That incident leads to delays during the annotation and review process.

Incident Timing (UTC)

24/07/23  1:10 pm to 25/07/23 6:35 pm

Incident Timeline (UTC)

24/07/2023

  • First alert or first ticket

    • Reported by customers at 12:45 am
  • First investigations on database metrics and graphs

  • First Announcement

    • Status page incident creation at 1:17 pm
  • Status page update time

    • 01:17 pm  Investigation in progress:  We are currently investigating this issue.
    • 01:17 pm Situation under control A fix has been implemented and we are monitoring the results
    • 01:30 pm  Resolved This incident has been resolved.

Actions

24/07/2023

Restart some app containers to force a reset of the database transactions

25/07/2023 6:35 pm

Hotfix deployment

End-User Impact

Global slowness on the platform could lead to delays during the annotation and review process.

What caused the incident?

Some database transactions were not closed because of uncaught exceptions.

Thus, a lot of database locks occurred and slowed down the overall system performance.

Corrective elements put in place to ensure that this does not happen again

We added monitoring alerts on long database transactions to detect sooner the issue

Posted Jul 27, 2023 - 15:29 UTC

Resolved
This incident has been resolved.
Posted Jul 24, 2023 - 13:30 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jul 24, 2023 - 13:17 UTC
This incident affected: Europe (Kili API).