[Urgent] Production Issue
Incident Report for Kili
Postmortem

Production issue December 13th, 2024 

Summary

On December 13, at approximately 4:30 PM UTC+1, the platform experienced a disruption caused by a DNS issue with our domain name provider. This issue impacted several users, resulting in platform unavailability. The situation was declared at 5:10 PM on our status page and mitigation efforts began immediately, restoring stability by 5:52 PM. The full resolution, including implementing safeguards to prevent recurrence, was completed on December 16 at 9:35 AM.

Incident Timeline (UTC+1)

  • December 13th, 4:30 pm

    • First alert internally
  • December 13th, 5:10 pm 

    • Incident created
    • First investigations 
  • December 13th, 5:52 pm

    • A fix has been implemented
    • Customer impact end 
  • December 13th, 5:53 pm

    • Incident resolved: This incident has been resolved.

End-User Impact

The application was not available for a majority of users. 

What caused the incident?

The incident was caused by a domain registration issue with our DNS provider, leading to the temporary unavailability of platform services.

Corrective elements put in place to ensure that this does not happen again

We have implemented an advanced alerting system to monitor DNS health and  immediately notify our team of any anomalies or disruptions.

Posted Dec 18, 2024 - 17:55 UTC

Resolved
This incident has been resolved.
Posted Dec 13, 2024 - 16:53 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Dec 13, 2024 - 16:35 UTC
Investigating
Dear all,

We inform you that we are currently experiencing a production incident that is impacting our services.
We apologize for any inconvenience this may have caused.

We are working diligently to resolve this issue and restore our services as soon as possible. We will continue to provide updates on our status page https://status.kili-technology.com/

Thank you for your understanding and patience during this time.

Sincerely,

Kili Support Team
Posted Dec 13, 2024 - 16:15 UTC
This incident affected: US (Kili Frontend) and Europe (Kili Frontend).