Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Postmortem: Hosted Build delays and Cloud Load Test errors on Visual Studio Team Services on 3/15

Customer Impact:

For 132 minutes on 15 March 2017, users of Visual Studio Team Services experienced delays and some timeouts for their builds and Load tests.  During the incident, builds for approximately 1,500 Hosted Build accounts did not start and eventually failed due to timeout errors. Also, tests for 40 Cloud Load Test accounts failed . The incident started at 21:48 UTC on 15 March 2017 and was active until 00:00 UTC the following day.

What went wrong:

The workflows for both Hosted Build and Cloud Load Test rely on the provisioning of new a VM for each build and test run.  On 15 March 2017 at 21:42 UTC the Azure Storage Resource Provider started failing for all service management operations globally.  As both Hosted Build and Cloud Load Test were unable to refresh their VM capacity, it resulted in queued builds and test run failures.  The VSTS SRE team was alerted 36 minutes into the incident when the Build VM pool was exhausted. While trying to engage our partners in Azure we have discovered that Azure was already aware of this issue and working on a fix. After Azure Storage engineers applied a hotfix, both the Hosted Build and Cloud Load Test workflows recovered without manual intervention.

Next Steps:

Within the VSTS service we identified opportunities to improve our monitoring. Specifically, within existing telemetry, we log exceptions for failed Azure management calls which we will enable us to alert the team earlier for future issues. Additionally, our partners in Azure Storage have identified several resiliency improvements that are outlined in RCA noted below.

Azure Storage Service RCA: https://azure.microsoft.com/en-us/status/history (View entry titled ”RCA – Storage provisioning impacting multiple services” on 3/16/2017).

Sincerely,
Sri Harsha

Share the post

Postmortem: Hosted Build delays and Cloud Load Test errors on Visual Studio Team Services on 3/15

×

Subscribe to Msdn Blogs | Get The Latest Information, Insights, Announcements, And News From Microsoft Experts And Developers In The Msdn Blogs.

Get updates delivered right to your inbox!

Thank you for your subscription

×