Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Details of the Pulumi Outage on October 6, 2023

Create, deploy, and manage cloud infrastructure using your favorite language.Get started →Posted on Wednesday, Oct 18, 2023At Pulumi, we understand that Pulumi Cloud plays an important role in how our customers address their infrastructure management challenges. As a result, we strive for the highest levels of availability and performance in Pulumi Cloud. Unfortunately, on Friday, October 6, 2023, Pulumi Cloud suffered a 24 minute outage during which we failed to process 74.7% of received requests. In this post, we’d like to share our findings on the root cause of this outage, and the steps we are taking to ensure this sort of outage doesn’t happen again.On October 6th at approximately 17:15 UTC we shipped a database migration modifying foreign keys on a table to our production environment, clearing it for release after testing in several non-production environments and a few rounds of peer review. However, the pre-production testing was not an adequate substitute to test the behavior of the migration when running on our production dataset under full traffic load.Adding foreign key constraints to a table can be done “in place” with an asterisk. Testing and review missed that we weren’t abiding by that asterisk. The resulting table copy operation caused by the bad migration held a lock for a significant amount of time, and caused a query pileup that starved our database of all available connections.This is the first time in six years that Pulumi has seen an outage of this scale. We are careful not to make changes to high traffic tables on the core API path responsible for handling updates and storing state. When we’ve needed to update these tables in the past, we were heedful in standing up new tables; duplicating writes; and cutting over to new tables without downtime.During the review process, we had categorized the migration as low risk, as the affected table is low traffic relative to our other workloads (



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

Details of the Pulumi Outage on October 6, 2023

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×