Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

EKS Cluster Upgrades: A Step-by-Step Guide to a Smooth and Secure Process

Sign upSign InSign upSign InSeifeddine RajhiFollowITNEXT--ListenShareA Step-by-Step Upgrade HandBookThere’s no magic, just mad science 🚀The cloud computing landscape is constantly evolving, and it can be difficult to keep up with the latest changes. However, staying up-to-date is essential for ensuring the security and reliability of your infrastructure.Amazon Elastic Kubernetes Service (EKS) is a managed Kubernetes service that makes it easy to deploy, manage, and scale containerized applications. One of the most important tasks for EKS cluster administrators is to perform regular upgrades. This ensures that your cluster is running on the latest version of Kubernetes, which includes the latest security patches and features.This guide provides a step-by-step walkthrough of the EKS cluster upgrade process. We cover all the essential steps, from planning the upgrade to testing and deploying the new version. Whether you are working with self-managed nodes, managed node groups, Karpenter nodes, or Fargate nodes, we have you covered.By following the guidance in this guide, you can confidently upgrade your EKS cluster without disrupting your applications.So let’s get started!How frequently should you perform Kubernetes upgrades? This is a crucial aspect often overlooked by newcomers to Kubernetes. Unlike many traditional infrastructure projects, Kubernetes evolves rapidly through its versions. Upgrading Kubernetes cannot be likened to switching to a new long-term support (LTS) release of a Linux distribution; it’s a continuous process that demands regular attention.To be fair, the Kubernetes team has taken significant steps to make this process more manageable. They follow an N-2 support policy, ensuring that the three most recent minor versions receive security fixes and bug patches. This approach gives you ample time to establish a cluster and incorporate upgrade planning into your initial cluster design. Waiting until your cluster is nearly at its end-of-life (EOL) to contemplate upgrades is not a viable strategy. Each release remains eligible for patches for 14 months, which may seem like an extended period, but you’re unlikely to install the very latest release.So, how frequently should you upgrade Kubernetes? The answer is quite often. Kubernetes aims for three releases per year, down from the previous rate of four releases annually. To assess Kubernetes releases for your organization, you’ll likely need to manage multiple versions simultaneously in various environments.As a rule of thumb, I recommend letting a minor version bake in a development environment for at least two weeks, and the same applies to the subsequent stages, such as staging or sandbox environments. For production upgrades, ideally, you should have at least a month of solid data indicating that the organization won’t encounter issues.🤖 Strategic Kubernetes Upgrade Scheduling:A Kubernetes version encompasses both the control plane and the data plane. To ensure smooth operation, both the control plane and the data plane should run the same Kubernetes minor version, such as 1.24.In my carefully planned layout:☑️ Development Cluster: Embrace the Bleeding Edge☑️ Staging Cluster: A Step Behind Dev☑️ Production Cluster: Align with Staging☑️ Crucial Note: Exercise Caution with Minor Version Upgrades☑️ Balancing Speed and Safety: The Delicate Art of Version ManagementStaying current with Kubernetes releases is vital within the shared responsibility model for EKS and Kubernetes adoption.Frequent updates are the norm, with EKS typically releasing three minor Kubernetes versions annually, each supported for around 14 months. Always check the EKS Kubernetes release calendar for the latest information.You’re responsible for initiating upgrades for both the cluster control plane and data plane. While AWS handles the control plane during upgrades, the data plane, including Fargate pods and add-ons, falls under your purview. Planning is crucial to ensure workload availability.EKS supports in-place cluster upgrades, preserving resources and configuration consistency. This minimizes disruption for users and retains existing workloads and resources. Note that only one minor version upgrade can occur at a time.For multiple version updates, sequential upgrades are necessary. However, they pose a higher risk of downtime. Consider evaluating a blue/green cluster upgrade strategy in such cases.The EKS upgrade process is managed by AWS to ensure a seamless and safe transition between Kubernetes versions. Here is a detailed breakdown of the steps AWS takes to upgrade the EKS control plane:For additional data protection, consider using Velero, an open-source tool that simplifies the backup and recovery process for Kubernetes cluster resources and persistent volumes. Velero allows you to schedule and manage backups, as well as restore processes, providing an extra layer of safety for your data.The goal is to minimize potential disruptions during the upgrade process and maintain the stability of your services. It’s important to mention that this only looks for your application health and not for API’s that may be removed or deprecatedIn case an EKS upgrade fails, AWS has measures in place to minimize disruption and revert the control plane to its previous version:To upgrade a cluster you will need to take the following actions:The EKS Kubernetes version documentation includes a detailed list of changes for each version. Build a checklist for each upgrade.For specific EKS version upgrade guidance, review the documentation for notable changes and considerations for each version.Upgrading Add-ons and Components via the Kubernetes APIBefore initiating a cluster upgrade, it’s essential to have a comprehensive understanding of the versions of Kubernetes components in use. Conduct an inventory of cluster components, specifically focusing on those that directly interact with the Kubernetes API. These critical cluster components encompass monitoring and logging agents, cluster autoscalers, container storage drivers (e.g., EBS CSI, EFS CSI), ingress controllers, as well as any other workloads or add-ons reliant on direct Kubernetes API interactions.Pro TipCritical cluster components are frequently found within namespaces ending in *-system:Once you’ve identified components that depend on the Kubernetes API, refer to their documentation to ascertain version compatibility and any prerequisites for upgrading. For instance, consult the AWS Load Balancer Controller documentation for insights into version compatibility. Certain components might necessitate updates or configuration adjustments before proceeding with a cluster upgrade. It’s imperative to pay special attention to critical components like CoreDNS, kube-proxy, VPC CNI, and storage drivers.Clusters typically encompass a multitude of workloads relying on the Kubernetes API, essential for functionalities such as ingress control, continuous delivery systems, and monitoring tools. When embarking on an EKS cluster upgrade, it’s equally crucial to upgrade your add-ons and third-party tools, ensuring their seamless compatibility with the upgraded environment.See the following examples of common add-ons and their relevant upgrade documentation:To execute a successful control plane upgrade, AWS necessitates specific resources within your account. Without these resources in place, the upgrade process cannot proceed. The prerequisites for a control plane upgrade include:To update the cluster, Amazon EKS requires up to five available IP addresses from the subnets that you specified when you created your cluster.To verify that your subnets have enough IP addresses to upgrade the cluster you can run the following command:The VPC CNI Metrics Helper may be used to create a CloudWatch dashboard for VPC metrics.¶ Transition to EKS Add-ons:Amazon EKS seamlessly deploys essential add-ons like the Amazon VPC CNI plugin for Kubernetes, kube-proxy, and CoreDNS for each cluster. These add-ons can either be self-managed or installed via Amazon EKS Add-ons, offering an alternative approach to add-on management through the EKS API.With Amazon EKS Add-ons, you gain the convenience of updating versions with a single command. For instance:Check if you have any EKS Add-ons with:EKS Add-ons are not automatically upgraded during a control plane upgrade. You must initiate EKS add-on updates, and select the desired version.You are responsible for selecting a compatible version from all available versions. Review the guidance on add-on version compatibility.Amazon EKS Add-ons may only be upgraded one minor version at a time.Kube-no-trouble is an open source command line utility with the command kubent. When you run kubent without any arguments it will use your current KubeConfig context and scan the cluster and print a report with what APIs will be deprecated and removed.It can also be used to scan static manifest files and helm packages. It is recommended to run kubent as part of a continuous integration (CI) process to identify issues before manifests are deployed. Scanning manifests is also more accurate than scanning live clusters.Kube-no-trouble provides a sample Service Account and Role with the appropriate permissions for scanning the cluster.Another option is pluto which is similar to kubent because it supports scanning a live cluster, manifest files, helm charts and has a GitHub Action you can include in your CI process.See if you can upgrade safely against API paths. I use Pluto. This will check to see if you are calling deprecated or removed API paths in your configuration or helm charts. Run Pluto against local files with: pluto detect-files -d. You can also check Helm with: pluto detect-helm -owide. Adding all of this to CI is also pretty trivial and something I recommend for people managing many clusters.After you have identified what workloads and manifests need to be updated, you may need to change the resource type in your manifest files (e.g. PodSecurityPolicies to PodSecurityStandards). This will require updating the resource specification and additional research depending on what resource is being replaced.If the resource type is staying the same but API version needs to be updated you can use the kubectl-convert command to automatically convert your manifest files. For example, to convert an older Deployment to apps/v1. For more information, see Install kubectl convert pluginon the Kubernetes website.kubectl-convert -f --output-version /Check your Helm releases for upgrades. Since typically things like the CNI and other dependencies like CoreDNS are installed with Helm, this is often the fastest way to make sure you are running the latest version (check patch notes to ensure they support the version you are targeting). I use Nova for this.KubePug/Deprecations is designed to function as a kubectl plugin with the following capabilities:Key Features:How to Install as a Krew Plugin:Simply run the following command to install it as a Krew plugin:eksup is a command-line interface (CLI) designed to empower users with comprehensive information and tools for preparing their clusters for an upgrade. It streamlines the upgrade process by delivering critical insights and actions for a seamless transition.Key Functions:GoNoGo is an alpha-stage tool to determine the upgrade confidence of your cluster add-ons.To safeguard the availability of your workloads during a data plane upgrade, it’s crucial to configure PodDisruptionBudgets and topologySpreadConstraints appropriately. Remember that not all workloads demand the same level of availability. Thus, it’s imperative to assess your workload’s scale and requirements.Ensuring that workloads are distributed across multiple Availability Zones and hosts with topology spreads enhances the confidence that migrations to the new data plane will occur seamlessly and without disruptions.Here’s an illustrative example of a workload configuration that guarantees 80% of replicas are consistently available and efficiently spreads replicas across zones and hosts:AWS Resilience Hub has added Amazon Elastic Kubernetes Service (Amazon EKS) as a supported resource. Resilience Hub provides a single place to define, validate, and track the resilience of your applications so that you can avoid unnecessary downtime caused by software, infrastructure, or operational disruptions.Managed Node Groups and Karpenter both simplify node upgrades, but they take different approaches.Managed node groups automate the provisioning and lifecycle management of nodes. This means that you can create, automatically update, or terminate nodes with a single operation.In the default configuration, Karpenter automatically creates new nodes using the latest compatible EKS Optimized AMI. As EKS releases updated EKS Optimized AMIs or the cluster is upgraded, Karpenter will automatically start using these images. Karpenter also implements Node Expiry to update nodes.Karpenter can be configured to use custom AMIs. If you use custom AMIs with Karpenter, you are responsible for the version of kubelet.Self managed node groups are EC2 instances that were deployed in your account and attached to the cluster outside of the EKS service. These are usually deployed and managed by some form of automation tooling. To upgrade self-managed node groups you should refer to your tools documentation.For example, eksctl supports deleting and draining self-managed nodes.Some common tools include:New versions of Kubernetes introduce significant changes to your Amazon EKS cluster. After you upgrade a cluster, you can’t downgrade it.Velero is an community supported open-source tool that can be used to take backups of existing clusters and apply the backups to a new cluster.Note that you can only create new clusters for Kubernetes versions currently supported by EKS. If the version your cluster is currently running is still supported and an upgrade fails, you can create a new cluster with the original version and restore the data plane. Note that AWS resources, including IAM, are not included in the backup by Velero. These resources would need to be recreated.Rather than solely focusing on the immediate next version of Kubernetes, adopt a forward-thinking approach. Continuously monitor new Kubernetes releases and be vigilant in identifying significant alterations. For instance, certain applications directly interfaced with the Docker API, and Kubernetes 1.24 made a pivotal change by removing support for Container Runtime Interface (CRI) for Docker, commonly known as Dockershim. 🐳 Preparing for such substantial changes demands additional time and planning.Examine all documented modifications for the version to which you plan to upgrade, meticulously noting any mandatory upgrade procedures. Additionally, pay attention to specific requirements or processes tailored to Amazon EKS managed clusters. This proactive stance ensures a smoother upgrade process while mitigating potential disruptions caused by unforeseen changes.Also, note any requirements or procedures that are specific to Amazon EKS managed clusters.▶️ Removal of Dockershim in 1.25 — Transition to Detector for Docker Socket (DDS)In Kubernetes 1.25, Dockershim support has been discontinued, particularly in the EKS Optimized AMI for 1.25. If your applications rely on Dockershim, for instance, by mounting the Docker socket, it’s imperative to eliminate these dependencies before proceeding with the upgrade of your worker nodes to version 1.25. This ensures a seamless transition without any disruptions caused by the removal of Dockershim.Find instances where you have a dependency on the Docker socket before upgrading to 1.25. We recommend using Detector for Docker Socket (DDS), a kubectl plugin.▶️ Removal of PodSecurityPolicy in 1.25 — Migration to Pod Security Standards or Policy-as-Code SolutionPodSecurityPolicy, deprecated in Kubernetes 1.21, has been entirely removed in Kubernetes 1.25. If your cluster currently utilizes PodSecurityPolicy, it’s paramount to initiate a migration process before upgrading your cluster to version 1.25. This migration should involve transitioning to the native Kubernetes Pod Security Standards (PSS) or implementing a policy-as-code solution. This proactive step is crucial for maintaining the uninterrupted functionality of your workloads during the upgrade.AWS published a detailed FAQ in the EKS documentation.Review the Pod Security Standards (PSS) and Pod Security Admission (PSA) best practices.Review the PodSecurityPolicy Deprecation blog post on the Kubernetes website.▶️ Deprecation of In-Tree Storage Driver in 1.23 — Migrate to Container Storage Interface (CSI) Drivers:The Container Storage Interface (CSI) was designed to help Kubernetes replace its existing, in-tree storage driver mechanisms. The Amazon EBS container storage interface (CSI) migration feature is enabled by default in Amazon EKS 1.23 and later clusters. If you have pods running on a version 1.22 or earlier cluster, then you must install the Amazon EBS CSI driver before updating your cluster to version 1.23 to avoid service interruption.Review the Amazon EBS CSI migration frequently asked questions.I trust that the information provided has been valuable to you. The journey of keeping Kubernetes consistently upgraded becomes less daunting with regular practice. The key takeaway is to allocate ample time to acclimate your environment with each minor release. By adhering to a predictable schedule, the process of upgrading clusters becomes remarkably painless and straightforward, even for less experienced users, as long as you perform the necessary checks.Remember, the key to success lies in regularity and thorough preparation when it comes to Kubernetes upgrades.We hope that you have found this blog post helpful. If you have any other tips or tricks that you would like to share, please leave a comment below.Thank you for reading! 🙌🏻😁📑See you in the next blog.🚀 Feel free to connect with me :🔗 LinkedIn: https://www.linkedin.com/in/rajhi-saif/🐦 Twitter : https://twitter.com/rajhisaifeddineThe end ✌🏻repost.aws----ITNEXTI build and break stuff, preferably in the cloud, ❤ OpenSource.Seifeddine Rajhi--Juntao QiuinITNEXT--12Futari Boy - developer & indie hackerinITNEXT--7Seifeddine Rajhi--Gustavo Zanotto--1Nikhil YN--Romaric PhilogèneinDevOps.dev--4Vinay Konakanchi--Mutha Nagavamsi--2Senna Semakula-Buuza--2HelpStatusWritersBlogCareersPrivacyTermsAboutText to speechTeams



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

EKS Cluster Upgrades: A Step-by-Step Guide to a Smooth and Secure Process

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×