Tags:

October 11th 2023

Posted on Oct 11 • Originally published at ably.com When it comes to scaling WebSockets, there’s no such thing as a one-size-fits-all solution. Supporting a high volume of connections while maintaining the lowest possible latency is the name of the game, but your specific scaling strategy likely depends on your project’s requirements.On this page, we’ll outline the considerations you need to make and present a summary of the options for scaling WebSockets. Based on our experience scaling Ably to support an almost-infinite number of connections, we’re presenting not only the theory behind scaling WebSockets, but drawing your attention to some hidden scaling issues that creep up in the real world as well. Read on below or, if you’re more visually inclined, we’ve summarized the key tips in this video.In a nutshell, WebSocket is a realtime web technology that enables bidirectional, full-duplex communication between client and server over a persistent connection. The WebSocket connection is kept alive for as long as needed (in theory, it can last forever), allowing the server and the client to send data at will, with minimal overhead.Learn more:Yes, WebSockets are scalable. Companies like Slack, Netflix, and Uber use WebSockets to power realtime features in their apps for millions of end-users. For example, Slack uses WebSockets for instant messaging between chat users. However, scaling WebSockets is non-trivial, and involves numerous engineering decisions and technical trade-offs. This blog post describes the challenges of scaling WebSockets dependably, so you can be better prepared to tackle them. We’ll go through the following topics:There are two main paths you can take to scale your server layer: vertical scaling or horizontal scaling.Horizontal scaling can be challenging but is an effective way to scale your WebSocket application.Vertical scaling, or scale-up, adds more power (e.g., CPU cores and memory) to an existing server.At first glance, vertical scaling seems attractive, as it’s pretty straightforward to implement. However, imagine you’ve developed an increasingly popular live user experience such as a live sports data app. It’s a success story, but you’re dealing with an ever-growing number of WebSocket connections as more people sign up and use the app. If you have just one WebSocket server, there’s a finite amount of resources you can add to it, which limits the number of concurrent connections your system can handle. Some server technologies, such as NodeJS, cannot take advantage of extra CPU cores. Running multiple server processes can help, but you’ll need an additional mechanism, such as another intervening reverse proxy, to balance traffic between the server processes.With vertical scaling, you have a single point of failure. What happens if your WebSocket server fails or you need to do some maintenance and take it offline? It’s not a suitable approach for a product system, which is why the alternative, horizontal scaling, is recommended instead.Horizontal scaling, or scale-out, involves adding more servers to share the processing workload. If you compare horizontal and vertical scaling, horizontal scaling is a more available model in the long run. Even if a server crashes or needs to be upgraded, you are in a much better position to protect your overall availability because the workload is distributed to the other nodes in the network. To ensure horizontal scalability, you’ll need to tackle some engineering complexity.A major element is that you need to ensure the servers share the compute burden evenly: you need a load balancer layer that can handle traffic at scale. Load balancers detect the health of backend resources. If a server goes down, the load balancer redirects its traffic to the remaining operational servers. The load balancer automatically starts distributing traffic whenever you add a new WebSocket server. Load balancing distributes incoming network traffic across a group of backend servers. Load balancing is an essential part of horizontal scalability, since an effective load balancing strategy endows your architecture with the following characteristics:Load balancing also enables you to add or remove servers as demand dictates.A load balancer will follow an algorithm to determine how to distribute requests across your server farm. There are many different ways to load balance traffic, and the algorithm you select will depend on your needs. Round-robin: Each server gets an equal share of the traffic. For a simplified example, let’s assume we have two servers, A and B. The first connection goes to server A; the second goes to server B; the third goes to server A; the fourth goes to B, and so on.Weighted round-robin: Each server gets a different share of the traffic based on capacity.Least-connected: Each server gets a share of the traffic based on how many connections it currently has.Least-loaded: Each server gets a share of the traffic based on how much load it currently has.Least response time: Traffic is routed to the server that takes the least time to respond to a health monitoring request (the response speed indicates how loaded a server is). Some load balancers might also factor in each server’s number of active connections.Hashing methods: Routing is decided by a hash of various data from the incoming connection, such as port number, domain name, and IP address.Random two choices: The load balancer randomly picks two servers and routes a new connection to the machine with the fewest active connections.Custom load: The load balancer queries the load on individual servers using the Simple Network Management Protocol (SNMP) and assigns a new connection to the server with the best load metrics. You can define various metrics to look at, such as CPU usage, memory, and response time. Your choice of algorithm will depend on your most common usage scenario. Consider, for example, a situation where you have the same number of messages sent to all connected clients, such as live score updates. You can use the round-robin approach if your servers have roughly identical computing capabilities and storage capacity. By contrast, if your typical use case involves some connections being more resource-intensive than others, a round-robin strategy might not distribute the load evenly, and you would find it better to use the least bandwidth algorithm. A sticky session is a load-balancing strategy where each user is “sticky” to a specific server. For example, if a user connects to server A, they will always connect to server A, even if another server has less load.Sticky sessions can be helpful in some situations but can also be fragile and hinder your approach to scale dynamically. For example, if your WebSocket server becomes overwhelmed and needs to shed connections to balance traffic, or if it fails, a sticky client will keep trying to reconnect to it. It’s hard to rebalance a load when sessions are sticky, and it’s more optimal to use non-sticky sessions accompanied by a mechanism that allows your WebSocket servers to share connection state to ensure stream continuity without needing a connection to the same server. While this article covers WebSockets in particular, there is rarely a one-size-fits-all protocol in large-scale systems. Different protocols serve different purposes better than others. Under some circumstances, you won’t be able to use WebSockets. For example:Your system needs a fallback strategy, and many WebSocket solutions offer such support. For example, Socket.IO will opaquely try to establish a WebSocket connection if possible and will otherwise fall back to HTTP long polling. In the context of scale, it’s essential to consider the impact that fallbacks may have on the availability of your system. Suppose you have many simultaneous users connected to your system, and an incident causes a significant proportion of the WebSocket connections to fall back to long polling. Your server will experience greater demand as that protocol is more resource-intensive (increased RAM usage). To ensure your system’s availability and uptime, your server layer needs to be elastic and have enough capacity to deal with the increased load. In addition to horizontal scaling, you should also consider the elasticity of your WebSocket server layer so that it can cope with unpredictable numbers of end-user connections. Design your system in such a way that it’s able to handle an unknown and volatile number of simultaneous users. There’s a moderate overhead in establishing a new WebSocket connection — the process involves a non-trivial request/response pair between the client and the server, known as the opening handshake. Imagine tens of thousands or millions of client devices trying to open WebSocket connections simultaneously. Such a scenario leads to a massive burst in traffic, and your system needs to be able to cope. You should architect your system based on a pattern designed to scale sufficiently to handle unpredictability. One of the most popular and dependable choices is the publish and subscribe (Pub/Sub) pattern.Pub/Sub provides a framework for message exchange between publishers (typically your server) and subscribers (often, end-user devices). Publishers and subscribers are unaware of each other, as they are decoupled by a message broker, which usually groups messages into channels (or topics). Publishers send messages to channels, while subscribers receive messages by subscribing to relevant channels.The Pub/Sub patternDecoupling can reduce engineering complexity. There can be limitless subscribers as only the message broker needs to handle scaling connection numbers. As long as the message broker can scale predictably and reliably, your system can deal with the unpredictable number of concurrent users connecting over WebSockets.Numerous projects are built with WebSockets and Pub/Sub; many open-source libraries and commercial solutions combine these elements. Examples include Socket.IO with the Redis Pub/Sub adapter, SocketCluster.io, or Django Channels. We’ll now go through a few considerations related to WebSocket connection management: restoring connections, managing heartbeats, and handling backpressure.At a particular scale, you may have to deal with traffic congestion since if the situation is left unchecked, it can lead to cascading failures and even a total collapse of your system. You need a load shedding strategy to detect congestion and fail gracefully when a server approaches overload by rejecting some or all of the incoming traffic. Here are a few things to have in mind when shedding connections:You might also consider dropping existing connections to reduce the load on your system; for example, the idle ones (which, even though idle, are still consuming resources due to heartbeats). TIP: You need to have a load shedding strategy; failing gracefully is always better than a total collapse of your system. Connections inevitably drop, for example, as users lose connectivity or if one of your servers crashes or sheds connections. When scenarios like these occur, WebSocket connections need to be restored.You could implement a script to reconnect clients automatically. However, suppose reconnection attempts occur immediately after the connection closes. If the server does not have enough capacity, it can put your system under more pressure and lead to cascading failures. An improvement would be to exponentially increase the delay after each reconnection attempt, increasing the waiting time between retries to a maximum backoff time. Compared to a simple reconnection script, this is better because it gives you some time to add more capacity to your system so that it can deal with all the WebSocket reconnections. You can improve exponential backoff by making it random, so not all clients reconnect simultaneously.Data integrity (guaranteed ordering and exactly-once delivery) is crucial for some use cases. Once a WebSocket connection is re-established, the data stream must resume precisely where it left off. Think, for example, of features like live chat, where missing messages due to a disconnection or receiving them out of order leads to a poor user experience and causes confusion and frustration. If resuming a stream exactly where it left off after brief disconnections is essential to your use case, you’ll need to consider how to cache messages and whether to transfer data to persistent storage. You’ll also need to manage stream resumes when a WebSocket client reconnects and think about how to synchronize the connection state across your servers. The WebSocket protocol natively supports control frames known as Ping and Pong. These control frames are an application-level heartbeat mechanism to detect whether a WebSocket connection is alive. At scale, you should closely monitor heartbeats’ effect on your system. Thousands or millions of concurrent connections with a high heartbeat rate will add a significant load to your WebSocket servers. If you examine the ratio of Ping/Pong frames to actual messages sent over WebSockets, you might send more heartbeats than messages. If your use case allows, reduce the frequency of heartbeats to make it easier to scale. Backpressure is one of the critical issues you will have to deal with when streaming data to client devices at scale over the internet. For example, let’s assume you are streaming 20 messages per second, but a client can only handle 15 messages per second. What do you do with the remaining five messages per second that the client cannot consume? You need a way to monitor the buffers building up on the sockets used to stream data to clients and ensure a buffer never grows beyond what the downstream connection can sustain. Beyond client-side issues, if you don’t actively manage buffers, you’re risking exhausting the resources of your server layer — this can happen very fast when you have thousands of concurrent WebSocket connections. A typical backpressure corrective action is to drop packets indiscriminately. To reduce bandwidth and latency, in addition to dropping packets, you should consider something like message delta compression, which generally uses a diff algorithm to send only the changes from the previous message to the consumer rather than the entire message. Here are best practices for scaling your WebSocket infrastructure:✅ Use horizontal scaling rather than vertical scaling. It’s more reliable, especially for use cases where you can’t afford your system to be unavailable under any circumstances.✅ If possible, use smaller machines (servers) rather than large ones. They are easier and faster to spin up, and costs are more granular.✅ Aim to have a homogeneous server farm. It’s much more complicated to balance load efficiently and evenly across machines with different configurations.✅ Have a good understanding of your use case and relevant parameters (such as usage patterns and bandwidth) before choosing a load balancing algorithm. ✅ Ensure your server layer is dynamically elastic, so you can quickly scale out when you have traffic spikes. You should also operate with some capacity margin, and have backups for various system components, to ensure redundancy and remove single points of failure.✅ There is rarely a one-size-fits-all protocol in large-scale systems; different protocols serve different purposes better than others. You need to think about what other options your system needs to support in addition to WebSockets, and consider ways to ensure protocol interoperability. ✅ You most likely need to support fallback transports, such as Comet long polling, because WebSockets, although widely supported, are blocked by certain enterprise firewalls and networks. Note that falling back to another protocol changes your scaling parameters; after all, stateful WebSockets are fundamentally different from stateless HTTP, so you need a strategy to scale both.✅ Run load and stress testing to understand how your system behaves under peak load, and enforce hard limits (for example, maximum number of concurrent WebSocket connections) to have some predictability.✅ WebSocket connections and traffic over the public internet are unpredictable and rapidly shifting. You need a robust realtime monitoring and alerting stack, to enable you to detect and quickly implement remedial measures when issues occur. ✅ You need to have a load shedding strategy; failing gracefully is always better than a total collapse of your system. ✅ Use a tiered infrastructure to enable you to recover from faults and coordinate between servers.✅ Some WebSocket connections will inevitably break at some point. You need a strategy for ensuring that after the WebSocket connections are restored, you can resume the stream with ordering and delivery (preferably exactly-once) guaranteed.✅ Use a random exponential backoff mechanism when handling reconnections. This allows you to protect your server layer from being overwhelmed, prevents cascading failures, and gives you time to add more capacity to your system.✅ Keep track of idle connections and close them. Even if no messages (text or binary frames) are being sent, you are still sending ping/pong frames periodically, so even idle connections consume resources.We have discussed the challenge of scaling WebSockets for production-quality apps and online services, which can involve significant engineering complexity and draw heavily upon resources and time. You could spend several months and a heap of money. So how do you avoid this complexity?Before anything else, you should consider if WebSocket is the best choice for your use case, or if a WebSocket alternative is better suited. For example, if you only need to push text (string) data to browser clients, and you never expect to require bidirectional communication, then you could use something like Server-Sent Events (SSE). Compared to WebSockets, SSE is less complex and demanding, and easier to scale. Read about the differences between SSE and WebSocketsOn the other hand, if you’re building a web app where bidirectional communication is needed (e.g, a chat app), then WebSockets is indeed the ideal choice. But this doesn’t mean you have to deal with the challenges and costs of scaling and maintaining WebSocket infrastructure in-house; you can offload this complexity to a managed third-party PaaS such as Ably and reduce your cost. Explore our documentation to find out more and get started with a free Ably account.Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well Confirm For further actions, you may consider blocking this person and/or reporting abuse Yeom suyun - Oct 6 Daniel Glejzner - Oct 7 Guilherme Rizzo - Sep 18 Jakub Andrzejewski - Oct 9 Ably offers WebSockets, stream resume, history, presence, and managed third-party integrations to make it simple to build, extend, and deliver digital realtime experiences at scale. Once suspended, ably will not be able to comment or publish posts until their suspension is removed. Once unsuspended, ably will be able to comment and publish posts again. Once unpublished, all posts by ably will become hidden and only accessible to themselves. If ably is not suspended, they can still re-publish their posts from their dashboard. Note: Once unpublished, this post will become invisible to the public and only accessible to Ably Blog. They can still re-publish the post if they are not suspended. Thanks for keeping DEV Community safe. Here is what you can do to flag ably: ably consistently posts content that violates DEV Community's code of conduct because it is harassing, offensive or spammy. Unflagging ably will restore default visibility to their posts. DEV Community — A constructive and inclusive social network for software developers. With you every step of your journey. Built on Forem — the open source software that powers DEV and other inclusive communities.Made with love and Ruby on Rails. DEV Community © 2016 - 2023. We're a place where coders share, stay up-to-date and grow their careers.

This post first appeared on VedVyas Articles, please read the originial post: here

People also like

Challenges of scaling WebSockets

Related Articles

Challenges of scaling WebSockets

Related Articles

Share the post