Dec 3, 2025 incident

Incident Report for Thought Industries

Postmortem

Between 9:45 AM and 10:05 AM PDT on December 3, 2025, users of the US TI platform experienced elevated 503 error rates and increased load times. The issue was caused by congestion in the Rustici postback process, which led to degraded performance and 503 errors for a subset of users.

The congestion resulted from a combination of factors: (1) instability within internal AWS infrastructure and (2) improperly tuned timeout settings for state management during Rustici postback processing. When the platform failed to persist Rustici progress to an internal database—due to network instability or rate limiting—the original request connection remained open longer than intended.

When AWS infrastructure instability spiked, a critical accumulation of these hanging requests began to interfere with normal request processing, ultimately impacting non-Rustici platform functionality as well.

The infrastructure team has released fixes to both limit the impact of failed requests due to instability and further tuned scaling to ensure similar issues are not caused due to throttling.

We apologize for the inconvenience and will continue to monitor the platform to ensure a stable user experience.

Posted Dec 11, 2025 - 20:56 EST

Resolved

This incident has been resolved.
Posted Dec 03, 2025 - 13:57 EST

Monitoring

Between 9:45 AM PST and 10:05 AM PST platform monitoring detected two periods of elevated 503 errors that have self-resolved. We're monitoring the situation as we diagnose the cause.
Posted Dec 03, 2025 - 13:24 EST

Investigating

We’re aware of an issue that’s currently affecting parts of the platform. Our Engineering team is reviewing the situation and working diligently to resolve it. Updates will be posted here as they become available.
Posted Dec 03, 2025 - 13:15 EST
This incident affected: US - Platform.