Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

How To Use Python Cache To Speed Up Coding

No matter how long you’ve been using Python, you likely know how quickly projects can become unwieldy without proper data management. Throwing code together without considering its efficiency can lead to super slow load times and unresponsive programs that can ruin just about any project.

Whether you’re looking to optimize simple recursive functions or something more complicated, like a web scraping project, it pays to know how to incorporate Python caches into your code. This article does a deep dive into caching in Python, including what caching is, how it works, why it’s useful, different caching strategies, and how caching can improve all kinds of Python programs.

What Is a Python Cache?

Put simply, a Python cache is a temporary storage space in your computer’s working memory, or RAM, that allows you to quickly access certain data without pulling it from your disk storage or the server it was originally stored on. Many operating systems use dedicated CPU memory, usually an L1 or L2 cache, to store data temporarily. By contrast, Python caches exist at the software level rather than the hardware level and rely on your computer’s RAM.

After you set up a cache in your code, your program will search for the data it needs in the cache before checking your hard drive or an external server. If the cache contains the relevant data, the program retrieves it from the Python cache. This process is known as a Python cache hit.

Let’s consider two examples to illustrate cache positioning with respect to data:

First, imagine you have a program you frequently use to search through a locally stored database of items for those matching certain criteria, which are then subject to some analysis. The list is rather large, and some items match the criteria more often than others. In this case, a cache would temporarily store the previous matches so that your program only needs to search the list for new matches.

Next, say you’re building a program that pulls some static data from a website and performs several analyses on it. Perhaps it’s a large amount of data that takes a long time to gather, or maybe you pay for an API and each request costs you money.

Either way, a cache allows you to store the data temporarily and locally, so you don’t need to pull it from the host server for each step in your analysis.

What Are the Benefits of Caching?

By caching data that you need more frequently or data that’s time- or resource-intensive to gather, you can minimize how often your program needs to either pull data from an external server or recalculate the same data over and over again. So the two main benefits of caching are improved run times and reduced load on external servers.

Increased speed

Caching can improve the performance of most applications and programs. Though it’s especially handy for highly interactive web applications where users expect to receive responses in real time, it can improve the speed of a wide range of coding applications.

Here are some of the reasons that developers use caching:

  • It cuts down on access time. Instead of waiting for your computer to access a remote server, caching puts the data at your fingertips.
  • It protects your system from overload. Caching can dramatically reduce the number of requests your program sends to external data sources, thereby reducing the load on your system. This means that your network will run more smoothly, and you won’t need to call in your IT team as often.
  • It improves user experience. If you use Python to support your front-end development, a Python cache will allow users to open web pages, fill out forms, and interact with virtual assistants much more rapidly. Short load times lead to happier, more loyal users. It also leads to more natural interactions and engagement, which is great if you’re trying to learn more about your clients and target audience.

Reduced load on external servers

Big web scraping projects require you to find a balance in how often you scrape data from each website. You need to pull data often enough to ensure that it’s up to date for your intended use but not so often that you overload the external server.

Every time you send a request, it uses up the external server’s memory and CPU. Large scraping projects that need to access multiple pages on a website usually have to send repeated requests, which can prevent other users from accessing it. Doing repeated scrapes of a single site can lead to:

  • The site admins becoming aware that a bot (your crawler) is abusing its access
  • Your computer being banned from accessing the site
  • The external server crashing (although this is rare due to safety parameters in most commonly used web scrapers)
  • Legal action (also very rare)

There are several ways to minimize the number of requests your program sends, such as adding a built-in delay between requests. Unless you need to access frequently updated data, a Python cache lets you store data locally, eliminating the need for many intensive scrapes of the same information.

Why Is Caching Important in Python?

Regardless of the goal of your Python project, your computer will need to access some kind of data to complete it. Even in the first task most people complete when learning a new language — some variation of print(‘Hello, world!’) — you provide the machine with the data (“Hello, world!”) that it needs to complete the function (print()). As projects get more complicated, the amount of data they require grows, as does the amount of work your computer needs to do to gather that data.

Recursive functions, or functions that reference themselves, are a good example of how these facets of a project can get out of hand. Here’s a recursive Fibonacci function that illustrates this concept:

The Fibonacci sequence is a series of numbers in which each number is the sum of the two before it, usually starting with 0, 1, 1, 2, 3, 5, 8, 13, etc. For a program to calculate, for example, the fifth number in the sequence, it needs to know the starting point (0 and 1) and perform all the additions leading up to the third and fourth numbers in the sequence. The tree below visualizes this process.

Fibonacci calculation of the fifth number in the sequence. Source: Real Python 

A Python function that calculates the nth number in the sequence could look like this:

def fibonacci(x):

if x == 1 or x == 2:

return 1

return fibonacci(x – 1) + fibonacci(x – 2)

Because the function calls itself twice, the time needed to run it grows exponentially as you go further up in the sequence. As an example, running the function with x=35 and x=38 produces the following results and run times in seconds:

Result and run time for print(fibonacci(35))

Result and run time for print(fibonacci(38))

As highlighted in the diagram above, in order to do a single one of these calculations, the program has to perform the same sub-calculations several times, which results in this lag. These redundancies can be avoided by storing some of the more frequent or recent results of these calculations in a cache, ultimately improving your program’s performance.

Example Applications of Python Caches

Python caches are an excellent way to improve performance and speed up normal processing when used correctly. Whether you’re facing a CPU-bound or an I/O-bound problem, caching can help you correct the problem and achieve faster response times.

Its uses are in no way limited to what we discuss in this article. However, the breadth of its possible applications can make it difficult to know where to start. Here are some common uses of Python caches to use as a jumping-off point.

Store completed computations

Python caches can store your previously computed data so that your computer doesn’t have to make the same computations repeatedly, as in the above Fibonacci example that speeds up ongoing calculations. Instead, your program can look up the result of each node in the cache.

Store frequently accessed data

Databases and file systems are the workhorses of most operations. They can also be the weakest link when it comes to speed and responsiveness. Caching is a great way to improve access to frequently used data. A cache builds an extra storage layer on top of your database or file system to store the most frequently used data. Retaining data from the new storage layer is much faster than accessing the original database or file system.

For example, reading data from an operating system can be time-consuming, especially when accessing data from the lower levels of your operating system. Building an OS cache ensures that your most frequently needed data is stored at the highest level of the operating system, where it is easiest to retrieve.

API caching

Most popular web applications are built on an application programming interface, or API, which defines how users and the application interact with each other. Applications that use APIs are usually highly interactive to keep pace with the performance today’s users have come to expect.

There are two main ways to implement caching with an API: as an API manager or as a user. If you manage an API, you can use server- and client-side caching to improve the API’s performance. Server-side caching is useful when you have data or queries relevant to all your users, as the server can compute a response once and offer it up to anyone who needs it. Client-side caching, on the other hand, allows your API to store web pages or data on the client’s browser or device. This limits the pressure on the server, and client-side caches aren’t lost in the event of a server failure.

Note that it’s important to use the appropriate cache location for each piece of data — you’ll run into performance issues if you start caching user-specific queries on the server side, for example.

Python caching has several benefits for API users as well. Some APIs limit the number of data requests you can make from their servers and require you to pay more for additional requests or a higher limit. This highlights a clear financial upside of caching on top of the speed and performance benefits previously discussed. And if you’re pulling data from an API to manipulate and present to your own end users, the improved performance from using a cache will boost user experience.

Database caching

Users expect increasingly faster and more complex responses from applications, and this isn’t likely to change anytime soon. Similar to API caching, database caching can help you keep up with these expectations. A database cache is a data access layer positioned next to the database. Applications can access that layer directly to get the data they need quickly, resulting in lower data retrieval latency and increased throughput.

Many developers use database caching to speed up the process of storing and retrieving user profiles from a database. Normally, user profiles need to be moved from one server to another when they’re retrieved. When you use database caching, you’ll only need to move the profile once: the first time you retrieve the data from the server. After that, the profiles are readily available on a server close to the user.

Today’s consumers also increasingly expect high levels of personalization when they’re looking to purchase products and services. This means that it’s more important than ever for businesses to be able to access user profiles and data about individual customer preferences.

Using database caches is a great way to ensure this happens quickly and without errors.

Integrated caching

Integrated caching is important for any website that receives a large number of SQL and HTTP requests. An integrated cache uses an additional data layer within the existing computer memory that automatically caches data from the relevant databases. This way, there’s no need to decide what to cache because the cache will store whatever data is most frequently accessed. Integrated caches can also ensure the cached data is consistent with the data in the original database.

Integrated caching yields great database performance results by reducing latency, CPU expenditures, and memory usage. Of course, the website will still need to make some requests to the origin server — storing the entire database in the cache would defeat the purpose.

However, integrated caching produces a serious reduction in the number of requests. To do this, the integrated cache compares each new request to its set of stored policies. Where possible, it handles all the requests that match its stored policies and forwards requests to the back end when the response does not exist in the cache. Integrated caches can typically handle the majority of web content requests.

Tools for Caching Data in Python

There are a number of different ways to add Python caches to your Python programs. For example, you can:

  • Manually create a cache dictionary that your program populates and then searches before the original data source
  • Use a memoization function to cache parameter-specific function outputs
  • Use caching decorators, such as @cache or @cached_property, to apply memoization with less code

Here, we discuss some of the tools that can help you implement a Python cache and improve its efficiency, specifically decorators, GitHub Actions, and the Python DiskCache library.

Python cache decorators

Starting in Python version 3.2, the functools module added cache decorators to make caching/memoization far simpler. Two of these decorators are @cache and @cached_property. The @cache decorator is simple to use. Here’s an example of how to implement it using the previously mentioned Fibonacci function:

from functools import cache

@cache

def fibonacci(x):

if x == 1 or x == 2:

return 1

return fibonacci(x – 1) + fibonacci(x – 2)

The cache created here is unlimited in size and therefore doesn’t evict old values, so it’s best for data with minimal diversity, as in this kind of recursive function. If you need to limit the cache size, use the @lru_cache decorator instead and define a maxsize argument. This method uses the LRU, or least recently used, eviction policy for clearing old data to make room for the new. These policies are discussed in further detail below.

@cached_property, on the other hand, converts a class measure into a property, similar to @property. However, @cached_property caches the property as an attribute after calculating the result of that property once, avoiding unwanted iterative calculations.

Properties in Python can be cumbersome to access. They’re normally accessed the same way as attributes but require a computation each time. This can be a problem when it comes to retrieval. Depending on the property, calling one up can be time-consuming and expensive.

The cached_property decorator caches the results of the calculation, so your system won’t need to perform the same calculation every time you need to retrieve the data. It also allows you to time out the cache so the data is cleared after a certain amount of time.

Let’s look at an example of how a Python cache can improve functions in Python. Imagine you built a list around a simple class on which you need to perform a summation. If you create a function to sum the list, the function will keep iterating over the list and recalculating the summation. This is time-consuming and tedious and will inevitably get in the way of other processes, slowing down the whole system.

Instead of getting stuck in this repetitive cycle of calculations, you can use @cached_property to calculate the summation only once and then store the result so you can access it whenever you need to.

Note that if you’re using a version of Python before v3.2, you’ll need to install cached_property with pip install cached-property before importing it with functools.

GitHub Actions/setup-python

GitHub Actions is a platform for automating workflows, and it comes in handy for implementing Python caches. Specifically, the setup-python action enables dependency caching for your Python projects, making use of tool kit/cache in the background but requiring less overall

configuration. To use setup-python to cache package dependencies, add the following to the action.yml file:

steps:

– uses: actions/checkout@v3

– uses: actions/setup-python@v4

with:

python-version: ‘3.9’

cache: ‘pip’ # caching pip dependencies

– run: pip install -r requirements.txt

Actions/setup-python supports the package managers pip, pipenv, and poetry, and GitHub provides additional instructions for the latter two and some other functions in their advanced usage guide. Some developers recommend also using the cache action, which caches the pip cache and reduces installation time.

Python DiskCache library

The DiskCache library opens up gigabytes of disk space for caching, making use of the empty space already on the disk. It also helps you manage your Python cache by quickly storing a key in the cache to improve the search function.

The DiskCache library acts like a dictionary for any data that you choose to store in a Python cache. DiskCache has a variety of customization options for clearing the Python cache, specifying the returned object types, and limiting how much memory is used in caching. It’s also useful for indexing data structures in web scraping applications.

You can install DiskCache with pip install diskcache in the command line, then import it with import diskcache in your program.

How To Clear a Cache in Python

So far, we’ve talked about how to implement a Python cache. But it’s just as important to manage your Python cache correctly. That means knowing how and when to clear the Python cache. Regularly clearing your Python cache ensures that users won’t receive stale data. This is especially important when working with time-sensitive or dynamic data that updates frequently.

Also, remember that your Python caches usually rely on your RAM. If you allow your cache to hold too much data, you run the risk of using up most of your computer’s memory, which will drastically reduce its performance. Clearing your Python cache also ensures that there will be enough space to store new information as necessary.

Keep the following strategic approaches to clearing and managing a Python cache in mind. The better you understand these strategies, the more successful your Python caching will be.

Clearing a Python cache based on TTL

The best way to ensure your Python cache doesn’t grow too large is to build on an automatic cleanup device. Typically, this involves assigning a Time to Live, or TTL, value to each new data set so it’s automatically deleted once the predetermined number of seconds has passed. This is a great way to keep your Python cache free of unnecessary or outdated data.

Different data sets require different TTLs depending on their level of time sensitivity. Data sets that routinely change and are subject to frequent updates should have a short TTL. Time-sensitive information, like stock prices or poll results, falls into this category. Such information should be regularly updated because it doesn’t stay the same for long. Data sets that remain constant over time should have a much higher TTL — or none at all — to minimize the cache’s update frequency. Information about directions or a store’s hours of operation falls into this category.

The most effective way to create a Python cache with a TTL is to install cachetools, which lets you add arguments to its @cached decorator. Here’s an example from the PyPi cachetools documentation that caches weather data for a maximum of 10 minutes:

from cachetools import cached, TTLCache

@cached(cache=TTLCache(maxsize=1024, ttl=600))

def get_weather(place):

return owm.weather_at_place(place).get_weather()

Ultimately, the TTL you choose should strike a balance between fresh content and reduced load times. For instance, if you’re building an application that displays the earthquakes that have occurred over the past week, pulling the minute-by-minute data from the U.S. Geological Survey website would be overkill, putting undue strain on both your system and the server.

Clearing content from your Python cache based on access pattern

Many developers organize their cached data using cache-clearing strategies based on the access pattern. The strategies define how items in the cache are replaced and allow you to specify how many items the cache will hold at a time. You usually don’t need a massive cache size, and minimizing it will keep you from maintaining a massive storage space on your local drive, which can eat up your RAM and tank your computer’s performance.

The cache-clearing strategies are categorized based on the order of the data that enters the cache, how recently each piece of data was accessed, or how frequently each piece has been accessed.

Queue-based methods:

  • First in, first out (FIFO). With FIFO cache clearing, the first item in the cache is the first to be cleared to make way for new data. This method is useful for dealing with data in the order it’s received but is not the most efficient for most use cases.
  • Last in, first out (LIFO). By contrast, LIFO data eviction policies clear the last items to enter the cache first. This strategy is best used when older entries are more likely to be relevant.

Recency-based methods:

  • Least recently used (LRU). LRU policies organize your data according to when it was last used and clear the least recently used data from the cache first. If your more recent cache entries are more likely to be reused in your program, this is the strategy for you. You can use the @lru_cache decorator to implement it and define the size of the cache in an argument based on what will best suit your project.
  • Most recently used (MRU). The opposite of LRU caching, MRU policies evict the data you most recently called on to make room for new data. Similar to LIFO caching, MRU caches are good to apply in situations where older items are more likely to be accessed.

Frequency-based method:

  • Least frequently used (LFU). This approach organizes data according to how often each item is used instead of how recently it was accessed. Sometimes, this means that recently used data is discarded from the cache because it’s not accessed very often.

Although some approaches are more common (e.g., LRU), one approach isn’t necessarily better than the others. Choosing the most efficient clearing method comes down to how you’re utilizing the cache. For example, FIFO eviction may be best if you’re caching user requests that must be dealt with in the order in which they’re received.

Tips To Improve Your Python Caching

Caching has a number of advantages, but like anything else, it has some potential downsides. It’s important to know how to avoid some of the pitfalls that can arise if you don’t implement your Python cache correctly.

Unfortunately, many companies today operate with a lean IT crew, which means they don’t always have enough expertise to navigate the possible downsides of setting up a Python cache. That’s why many businesses choose to partner with experienced technicians who can guide them through the stages of implementing, managing, and clearing their Python cache.

Whether you’re an experienced developer or are fairly new to caching in Python, it’s a good idea to keep the following tips in mind.

1. Caches are temporary

Always remember that the cache is intended to be a temporary rather than a permanent storage device. In most cases, your cache will include a built-in function to empty itself, either when the data reaches a certain age or when it’s not requested often enough.

Why does this matter? You don’t want to treat your Python cache as an extension of the memory on your device. Make sure you don’t use it to store important information that isn’t backed up elsewhere. Whatever you store in your Python cache is only there temporarily.

2. Caching should save time

Make sure that caching will actually save you time in the long run. Caching often speeds up the process of web scraping and other calculations, but in shorter operations, caching may take just as long as the program needs to run without a cache. Check whether the operation you’re caching will take less time to cache than finish. If not, then it isn’t worthwhile to cache at all.

There are several ways you can test the time savings. Let’s pull up the Fibonacci function one more time to illustrate one example. If you add timestamps before and after the function with and without the cache, you can see the difference:

import time

def fibonacci(x):

if x == 1 or x == 2:

return 1

return fibonacci(x – 1) + fibonacci(x – 2)

start = time.time()

print(fibonacci(38))

end = time.time()

print(end – start)

from functools import cache

import time

@cache

def fibonacci(x):

if x == 1 or x == 2:

return 1

return fibonacci(x – 1) + fibonacci(x – 2)

start = time.time()

print(fibonacci(38))

end = time.time()

print(end – start)

Sometimes, developers use caching automatically because they’re accustomed to it. Using a Python cache will save time and energy most of the time. However, there are occasions when it’s more time-consuming to implement caching than simply running your program without it.

So before you end up neck-deep in a complicated cache setup that requires hours of troubleshooting, make sure it’ll be worth it in the end.

3. Avoid caching content that has to be updated frequently

Caching isn’t always practical. Make sure you’re not caching an API or web page that updates frequently with different results. If you do, remember to invalidate the cache when you need to update it.

If you’re in a customer-facing business, your clients will want to access the most recent information. They’ll be frustrated if they receive stale, outdated information. Overreliance on caching without keeping pull frequency in mind can create this problem.

4. When in doubt, consult the experts

Caching can speed up most web pages, applications, and browsers and make long, complicated calculations more effective. It’s a natural complement to most uses of Python code. However, caching can also be challenging to implement and use correctly. You need to make sure that the cache refreshes itself when necessary and doesn’t overload your system or an external server.

Not all businesses have a well-rounded IT crew — if they need to spend their time developing and managing a Python cache, they wouldn’t have the time to solve urgent problems elsewhere or build new solutions. Many small and midsized businesses are also experiencing difficulty hiring new staff, making it harder to develop a good IT team. Many are making the wise choice to let teams of experts handle projects like caching instead of trying to do them in-house without the necessary expertise.

If you have doubts about caching and want to make sure it’s implemented in the most efficient way, it’s important to work with partners with caching experience who can help users and developers avoid all possible problems.

Working With the Experts at Rayobyte

When set up correctly, Python caching improves run times, delivering data faster. Using a Python cache can lead to happier users and more effective applications and vastly improve your or your company’s productivity.

Many web scraping projects can benefit from a Python cache, but there are other ways to improve the process, such as employing proxies. Rayobyte exists because we believe that proxy users deserve a company that will serve as their partner, not just their provider. Visit us today to learn how Rayobyte can take your business to the next level.

The information contained within this article, including information posted by official staff, guest-submitted material, message board postings, or other third-party material is presented solely for the purposes of education and furtherance of the knowledge of the reader. All trademarks used in this publication are hereby acknowledged as the property of their respective owners.



This post first appeared on Premium Proxy Providers, please read the originial post: here

Share the post

How To Use Python Cache To Speed Up Coding

×

Subscribe to Premium Proxy Providers

Get updates delivered right to your inbox!

Thank you for your subscription

×