Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Exploring concurrent rate limiters, mutexes, semaphores

Posted on Sep 11 • Originally published at blog.shalvah.me This is a dump of my learnings and experiments while going down a little rabbit hole.I was studying Sidekiq's page on rate limiters. The first type of rate limiting mentioned is the concurrent limiter: only n tasks are allowed to run at any point in time. Note that this is independent of time units (e.g. per second), or how long they take to run. The only limitation is the number of concurrent tasks/requests.So I asked myself, how would I implement a concurrent rate limiter? I'm fairly familiar with locking (via Redis and the database, for instance), so that was what came to mind, but in its usual form, that only works as a mutex (number of allowed tasks, n = 1). I wasn't sure about how to implement that when n > 1. Decided to dig into it from first principles.In this case, that meant stepping back to think about concurrency control in general, and the scenarios I know of.The first scenario is process-local: you have multiple threads within a process, and you want to ensure only n threads can access a resource at once. I already knew how to do this:Semaphores are similar to mutexes, but they are less about guaranteeing exclusive access and more about keeping track of who has access. A semaphore starts out with a fixed number of "permits". A Thread can request a Permit (similar to acquiring a lock), and that reduces the number of available permits. When all permits are in use, any requesting threads will have to wait until one is released. In this sense, a semaphore is kinda like a bouncer at a club—it regulates the number of people who can get in.There are many semaphore implementations available for Ruby. I decided to implement one myself. The key thing is that the semaphore governs access to a resource (the number of permits), so we need a way to ensure the semaphore can do this job safely. I used Ruby's native Mutex to achieve this:Usage:Sample output:This approach checks whether there are any available permits and returns one if so. Otherwise, it will sleep for 0.05 seconds and check again. The mutex guarantees that we can safely increment or decrement the number of permits without race conditions. This is a basic implementation, btw; one important thing missing is wait timeouts—we shouldn't have to wait forever.Also note that there is no expiry on a permit—a client could get a permit and refuse to release it! Apparently, that's by design; semaphores don't control what you do with the permit. The onus is on you to be responsible with it.This approach suffers from unfairness. Suppose thread A has been waiting for a permit to become available, and finally, another thread releases one. If at that moment, thread A is still sleeping, and a new thread (thread B) is launched, B might acquire the permit instead of A. In essence, an unlucky thread could wait for a very long time (or forever!) while newer threads get a permit. This is like if the bouncer selected people at random, instead of who's been waiting the longest.Also, the constant sleeping and waking up (polling) is suboptimal. We're giving the Ruby interpreter and OS more work to do (constantly scheduling and waking up the thread) when there might not be actually any permits available, and we're potentially taking time away from threads which have actual work to do.So I decided to try a different approach, using Ruby's Thread::Queue. Thread::Queue is a queue data structure designed for concurrent use; requesting an item (via #pop) will either return an item if one is available, or block until another thread adds one to the queue (via #push). So we model the list of permits as a queue (you can put anything you want in the queue, as long as it's the same number as the permits). Acquiring = popping, releasing = pushing.This works too (and the code is also much shorter). I'm not a 100% certain the waiting threads are served in order of who asked first (the docs don't say exactly that), but I think that's the case.After this, I took a look at the semaphore class in the popular library, concurrent-ruby to see how they implement it, and I learnt about something new: condition variables. And Ruby comes with this included!The name sounds super technical, but it's quite approachable in reality: a condition variable lets you tell other threads waiting on a resource that it's now available. It's meant to be used with a mutex. Instead of having the thread constantly poll, as in my initial implementation, the thread sleeps forever (or with a timeout), and gets woken up by the condition variable when a new permit is available. Here's the new implementation:Rather than polling (sleep-wake-check-sleep), we just sleep (@condition_variable.wait), and then when another thread is done, they call @condition_variable.signal, which will wake up the first waiting thread (so it's fair, yay). It reminds me a bit of events in JavaScript.Okay, good diversion; now, back to concurrent scenarios. We've looked at process-local scenarios. The second scenario is system-local, but separate processes. This is, for example, when you have multiple web server processes running on the same machine, and you want to control access to a shared resource.Processes don't share memory, so our mutex/semaphore can't live inside any single process. We have to use an external datastore, such as a cache (eg Redis), or a database (PostgreSQL). You could even use a file or what-have-you.A third scenario is in a distributed system: the processes are on separate machines. This is an extension of the above case, so the same approaches as above apply, but obviously in a distributed way (the datastore has to live on an external location all the processes can access).Okay, so how could we implement mutexes (n = 1) here?When dealing with locks, you want to avoid starvation (ie, an unruly/crashed client holding on to the lock and blocking everyone else). A common (but imperfect) way to avoid this is to set a lock expiry (how long a given client can hold a lock). This way, if a client is unable to release the lock (for instance, if they crashed), it will automatically be released after a while.You also want to limit contention, typically by setting a wait timeout (how long a client should wait for when another client is using the lock). If you don't set a wait timeout, you could end up with other processes hanging forever because a lock is in use. Sometimes that might be desired, but more likely you probably want to quit and try again later. Either way, you should decide on your wait timeout policy for your clients.A common pattern is to set a key in Redis, something like SET some-lock-key some-value NX EX expiry-in-seconds.This works, but has a few flaws:For SQL, I like Postgres' advisory locks. Running SELECT pg_advisory_lock(123) will give this client a lock called 123, and all other clients who run that same statement will have to wait until the first client releases the lock.With Postgres' advisory locks, you don't need a lock expiry, since the lock is bound to the session or transaction. If the client crashes, PG will release the lock. Wait timeouts aren't directly supported, but you can achieve this by using the lock_timeout setting (combined with transaction-level locks if you don't want global wait timeouts):MySQL also has advisory locks (although they are not transaction-bound) and wait timeouts:I love Redis, but I think I prefer SQL databases for mutexes. Since the Redis API does not expose any concept of a lock, we try to emulate it with the SET NX pattern or more complicated algorithms, which is why we have to do the "lock expiry" dance. PostgreSQL, on the other hand, has locks as a first-class function, which means it can provide better guarantees, such as the fact that locks will always be released when the session ends.How about semaphores (n > 1)? At first I was thinking of something using Redis transactions (MULTI) and WATCH, something like this (not valid code):WATCH will fail the transaction if the semaphore-permits-used is modified by another client, so this serves as a mutex for the permits. But this implementation seems pretty complex to me; it involves us switching between our app code and Redis multiple times (or making this a Redis script). i haven't had the chance to try it yet, though.Turns out, you can implement a semaphore in Redis quite simply with blocking list operations (akin to what we did with Thread::Queue in Ruby):It's still a bit ugly, because that first step is crucial (otherwise clients would wait forever). The best approach there is to either have each client check if the semaphore has been initialized (e.g. by checking if a certain key exists), and initialize it themselves if not; alternatively, you could have an explicit initialization step that creates the permits when your app starts up.Unfortunately, advisory locks don't help here. We need a good ol' database table to keep track of our semaphores.To kick things off, we create the semaphore with capacity n = 4:Checking out and releasing a permit is straightforward: update the used_permits count. But, to ensure exclusive access, we must use a transaction:The UPDATE statement will automatically lock the matching row (our semaphore) until it finishes executing, so no transaction needed (unless we want to specify a local timeout). All we need to do is check if the row was updated; if so, we have our permit.To release the permit, it's the reverse:One downside here is that there's no easy way to block while waiting for a new permit to be available. We'll have to rely on polling.I find it quite interesting the differences in approach between Redis and PG here: an explicit list of permit items (Redis) vs a counter (Postgres). I think you could use either approach in both, but it would be more complicated. (I actually had an initial implementation in Postgres that used multiple rows and transactional isolation, but it was def more complex.) Postgres' transactional guarantees make it easier to work with a single counter (= a single row), while Redis' list data structure and blocking options make that approach straightforward.We're almost there! Actually, we're there. A concurrent rate limiter is essentially a semaphore. The max number of permits = the max number of concurrent tasks.However, the Sidekiq version also includes lock expiry, so I spent some time thinking about it. My conclusion: there's no "nice" way to do it. The permit expiry approach I could think of (or find in the wild) was:I think that's it for that exploration. It was quite interesting racking my brain about semaphores and guarantees. My current conclusions are that I'd prefer to use Postgres for mutexes and Redis for semaphores. It was also a revelation that semaphores don't provide any guarantee of expiry, so you must program carefully around that.Templates let you quickly answer FAQs or store snippets for re-use.I really enjoyed the level of depth put in this article, not only showing different locking mechanisms in code but also applying them to practical real world applications. Some of the points you mentioned does remind me of AWS SQS which has visibility time outs for messages. That way if a task fails to complete it becomes re-visible again in the queue for another worker to pick it up again. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well Confirm For further actions, you may consider blocking this person and/or reporting abuse Ben Halpern - Aug 22 Cherry Ramatis - Aug 13 Brandon Casci - Aug 23 zigzagoon1 - Aug 2 Once suspended, shalvah will not be able to comment or publish posts until their suspension is removed. Once unsuspended, shalvah will be able to comment and publish posts again. Once unpublished, all posts by shalvah will become hidden and only accessible to themselves. If shalvah is not suspended, they can still re-publish their posts from their dashboard. Note: Once unpublished, this post will become invisible to the public and only accessible to Shalvah. They can still re-publish the post if they are not suspended. Thanks for keeping DEV Community safe. Here is what you can do to flag shalvah: shalvah consistently posts content that violates DEV Community's code of conduct because it is harassing, offensive or spammy. Unflagging shalvah will restore default visibility to their posts. DEV Community — A constructive and inclusive social network for software developers. With you every step of your journey. Built on Forem — the open source software that powers DEV and other inclusive communities.Made with love and Ruby on Rails. DEV Community © 2016 - 2023. We're a place where coders share, stay up-to-date and grow their careers.



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

Exploring concurrent rate limiters, mutexes, semaphores

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×