US - Slowness issue

Incident Report for Kili

Postmortem

US Cloud Production Issue – October 14, 2025

Summary:

On October 14, 2025, at approximately 3:45 PM UTC, the US cloud platform began experiencing high latency on backend services due to excessive load caused by a high quantity of output data. The incident was declared at 3:57 PM UTC and stability was restored by 5:05 PM UTC. The issue was fully resolved and closed on October 15, 7:40 AM UTC.

Incident Timeline:

Incident Detection & Customer Impact Start

  • Time: October 14, 3:45 PM UTC
  • Description: High latency was detected and reported by customers. 

Incident Declaration

  • Time: October 14, 4:15 PM UTC
  • Description: Incident published on the Status Page

Stable State Achieved

  • Time: October 14, 5:05 PM UTC
  • Description: The Backend was restarted as a short-term mitigation, restoring stability. The time to mitigate was 1 hour and 20 minutes.

Incident Resolution

  • Time: October 15, 7:40 AM UTC
  • Description: The incident was officially marked as resolved. Time to resolution: 15 hours and 55 minutes.

End-User Impact:

Users experienced high latency on our services for a total of 1 hour and 20 minutes, on October 14, 3:45 PM UTC to 5:05 PM UTC.

What caused the incident?

An exceptionally high quantity of output data overwhelmed our GraphQL server. The backend could not manage the excessive load, resulting in high latency across our services.

Corrective elements put in place to ensure that this does not happen again:

  • Immediate mitigation: Backend services were restarted to restore normal operations
  • Long-term mitigations are currently being developed and implemented to prevent similar load-related issues from occurring in the future:

    • Pre-computing the jsonResponse for exports so that exports are significantly faster and have minimal impact on the backend during processing.
    • Updating the way certain objects are sent through the GraphQL API to reduce backend load.
  • Enhanced monitoring and load management capabilities are being put in place to handle high-volume data processing scenarios better

We sincerely apologize for the inconvenience caused by this incident and its impact.

Thank you for your patience and continued trust.

The Kili Team

Posted Oct 20, 2025 - 15:06 UTC

Resolved

Dear users,

We have implemented a solution yesterday evening at 5 PM UTC to resolve the issue. All systems are now back to normal operation.

We apologize for any inconvenience caused and appreciate your patience during this time.

Our team will share a comprehensive post-mortem report in the coming days to provide more details on the incident and our preventive actions.

If you experience any further issues, please contact our [support team.](mailto:support@kili-technology.com)

Sincerely,
Kili Team
Posted Oct 15, 2025 - 08:17 UTC

Investigating

Dear users,

We are currently experiencing an issue that is impacting our services. Our team is actively investigating the root cause and working to resolve it as quickly as possible.

We will provide further updates on our status page https://status.kili-technology.com/.

Thank you for your patience and understanding.

Sincerely,
Kili Team
Posted Oct 14, 2025 - 16:15 UTC
This incident affected: US (US - Kili Frontend).