September 4th 2023

Sign upSign InSign upSign InTarasov AleksandrFollowITNEXT--ListenShareIn Continuous Integration (CI), speed and efficiency are of the essence. The adage “time is money” rings particularly true for developers, where every second saved in build times can translate to enhanced productivity and faster deployment cycles. This becomes even more crucial when discussing GoLang applications, renowned for their performance. However, like every technology, there are moments when even GoLang builds can benefit from some fine-tuning.This article highlights our journey to optimize GoLang application builds in our CI pipeline. We’ll walk you through the strategies, challenges, and results from the first attempts that laid the groundwork to the more refined next ones that halved our build times. So, if you want to boost your CI process, buckle up and let’s dive in!In our initial approach to optimizing GoLang application builds in CI, our process looked like this:These durations might seem quick for many, but we knew there was room for improvement in a CI/CD world where every second can translate to more efficient pipelines and faster feedback loops.The immediate thoughts revolved around low-hanging fruits. We implemented distributing caching via GCS (officially recommended way) to reduce redundancy and ensure quicker access to dependencies. Following this, our GitLab runners got a significant boost with more CPU power and memory. But the outcome? Only marginal improvements.Leveraging Google Kubernetes Engine (GKE), our setup consisted of a Kubernetes cluster harmoniously running on three nodes. Whenever a build was requested, our GitLab runner, hosted on GKE, would dutifully initiate a builder POD on one of these nodes based on resource availability. But there was a caveat that soon caught our attention.These PODs, although resourceful, come without any pre-configured caching mechanism.In simpler terms, they start from scratch every time they’re tasked with a build. They must fetch all the dependencies anew, making them heavily reliant on external sources like the internet or specific repositories. In the fast-paced world of CI/CD, such a setup isn’t just inefficient; it’s also fraught with potential issues, making builds susceptible to network hiccups or repository downtimes.To mitigate this, we decided to tap into the benefits of distributed caching. Given our infrastructure on Google Cloud, integrating with Google Cloud Storage (GCS) seemed a logical choice. This would theoretically store our build dependencies, ensuring that our PODs didn’t have to start from zero every time.Post-implementation, we noticed our builds became much more consistent, eliminating the variable of fetching dependencies. But there was a puzzling disconnect. Despite these changes, our build speed remained stubbornly static. The question then was, why?We paused and took a moment to assess our CI process. A glaring question arose: Why download the cache every single time? Traditional VM-based runners often keep the cache locally, and their builds benefit from that speed boost. But could the same luxury be afforded in a Kubernetes environment?One potential solution that crossed our minds was network filesystems, specifically Google Filestore. But our practical constraints prevented us from going down that route. The costs associated with Filestore seemed prohibitive, and while there were other self-hosted alternatives, our knowledge of them was limited. Sure, it’s possible that these could be viable solutions, but they’d require rigorous testing. Instead, our attention shifted to an age-old computing principle: locality.Enter node local disks. The logic is straightforward: what’s faster than accessing data from the machine it resides on?As a flexible system, Kubernetes can mount a single node folder across multiple PODs. It seemed like a win-win. But, as with all things tech, there was a wrinkle. Our build runners operated on an elastic node pool, contracting at night and expanding in the morning. This meant that a builder POD could start its day with a practically empty cache — not the best start to its workday.That led to an innovative blend of both worlds: marrying distributed cache with node-local storage. We devised a stateful set aptly named ‘cache syncer.’ This component’s primary job was to synchronize the node-local cache with the distributed one, but it operated exclusively on one node. When our cluster expanded with new nodes, we’d deploy the ‘cache syncer’ on them. This ensured the new nodes had the cache ready and waiting, avoiding redundant downloads.And by the time builder PODs were spun up on these nodes, a warm cache greeted them, leading to more efficient build processes.Having achieved a remarkable 16-second reduction, there was still a puzzling enigma. The build phase stubbornly remained consistent in its timing. This inconsistency gnawed at me, especially when considering that, locally, the same app could be built almost instantaneously. Allocating more CPUs seemed the intuitive fix, but this did not dent the build time.To navigate this problem, we dived deeper into understanding the different caches available to us in the Go environment.The first cache, GOMODCACHE, was already on our radar. It takes care of dependencies, ensuring they're readily available and doesn't have to be fetched repeatedly:But as we delved deeper, we stumbled upon another treasure: the build cache. This cache, lesser-known but equally vital, handles intermediate build artifacts. Realizing its potential, we defined it and incorporated it into our dual cache system, merging local and distributed storage.The result? A jaw-dropping improvement. Our build phase, which previously took 30 seconds, was now completed in a swift 2 seconds.In our journey to optimize our build pipeline, we managed to halve the time of our build jobs. What once took minutes now takes mere seconds. Our linting, testing, and build processes have been supercharged, wrapping up in less than 30 seconds.While our experiment revolved around Go, the principles can extend to other programming languages. However, factoring in each language's nuances and unique aspects is essential.It sounds like a win across the board, right? Well, mostly. But it’s essential to shed light on the drawbacks.Firstly, the cache size. As the cache grows, initializing a node takes longer. Moreover, this increasing cache demands more local disk space, translating to more costs. It’s a delicate balance between speed and storage expenses.Then there’s the probabilistic nature of our solution. Using a single node for cache replication introduces a level of uncertainty. Can we guarantee that one replication round is enough for thorough cache filling? No. Real-world scenarios have shown that our cache fills up rapidly and doesn’t majorly impact average build times. However, the element of unpredictability remains.In conclusion, we have made significant progress in accelerating our building processes. However, there may come a time when we need to reimagine our approaches again if we reach the limits of the current one. For now, let us celebrate this win.----ITNEXTPrincipal Release Engineer @ Cenomi. All thoughts are my own.Tarasov AleksandrinITNEXT--Vitalii ShevchukinITNEXT--2Futari Boy - developer & indie hackerinITNEXT--11Tarasov AleksandrinITNEXT--1KH Huang--Igor Carvalho--1Rafael Levi--1Sidharthan Chandrasekaran KamarajinThe Bug Shots--Romaric PhilogèneinDevOps.dev--2Mike Norgate--HelpStatusWritersBlogCareersPrivacyTermsAboutText to speechTeams

This post first appeared on VedVyas Articles, please read the originial post: here

People also like

Turbocharging CI: Halving GoLang Application Build Times

Related Articles

Turbocharging CI: Halving GoLang Application Build Times

Related Articles

Share the post

Subscribe to Vedvyas Articles

Thank you for your subscription