February 8th 2017

Last week, a backup Incident in a staging server on GitLab resulted in the deletion of the production database and was responsible for 6 hours of data loss and some server downtime.

The official blog post about the incident starts like this:

Unveiling the Secrets of Seamless Clo…
Customer Stunned by Electric Vehicle …
Another Appliance Company Now Files A…
How Long Does the Tapo C100 Store Vid…
Discover the Top 5 Beaches in Spain: …

Yesterday we had a serious incident with one of our databases. We lost six hours of database data (issues, merge requests, users, comments, snippets, etc.) for GitLab.com. Git/wiki repositories and self-hosted installations were not affected. Losing production data is unacceptable and in a few days we’ll publish a post on why this happened and a list of measures we will implement to prevent it happening again.

As we all know, data loss is a major nightmare for any product out there, but it’s worst if you’re a cloud code repository with a massive amount of daily users like GitLab. Although these things were not supposed to happen in 2017, organizations are made of humans (at least for now…) and making mistakes is part of being a human. But is the way that the organization deal with the problem that makes the difference.

GitLab’s Approach After Data Loss Incident

GitLab’s approach was based on transparency and it’s getting some positive feedback from the community.

They didn’t try to hide the problem and instead, they set up a live stream of the team resolving the problem (8 hours) and released a google docs explaining step by step how the mistake happened and how it got solved.

Link to the Google Docs.

All of this together with live updates on Twitter during the whole process:

We are performing emergency database maintenance, https://t.co/r11UmmDLDE will be taken offline

— GitLab.com Status (@gitlabstatus) January 31, 2017

We accidentally deleted production data and might have to restore from backup. Google Doc with live notes https://t.co/EVRbHzYlk8

— GitLab.com Status (@gitlabstatus) February 1, 2017

This kind of open approach from a company that, according to TechCrunch, count with more than 100 employees and an investment of $25.6M is something definitely not something that we see every day.

Luckily, due to the decentralized nature of git, the data loss in terms of code repositories must be minimal because the code stays on the local machine of the person who pushed it and everyone else who pulled or cloned it.

Still, hats off to GitLab.

The post GitLab.com Backup Failure and Data Loss Incident appeared first on João Ribeiro.

This post first appeared on JoÃ£o Ribeiro | Tech, please read the originial post: here