Tags: model series

April 19th 2023

Machine learning on Time Series is an extremely difficult task: accuracy of these models is not even comparable to the successful language and vision models everyone is hyped about. Although there is no magical solution, there is a battle-tested way to enhance the real-world effectiveness of your models: evaluating them with Time Series Cross-Validation!

The Ultimate Guide to Cloud Gaming: D…

Only if you could simulate how your models performed in the past with similar conditions, can you make sure that your expectations and models are realistic and valid. Otherwise, the models might underperform in production overall and at critical points and you will be almost certainly disappointed.

We’ve been on a long, unfruitful journey searching for a tool that could effectively simulate the deployment of time series models on past data. Current implementations are either not doing what you expect them to (see section) or unbearably slow (see section).

So, we built a library (@dream-faster/fold) to make it accessible: it comes with an order of magnitude speedup, while being more ergonomic. Fold allows us to move from the limited usefulness (and speed) of Time Series Cross-Validation to Continuous Validation.

Why are Time Series ML models disappointing today?

Concept drift and regime changes are inherent features of most Complex Dynamical Systems influenced by society (all the time series worth modelling). They pose a serious challenge when training models:

“How do we deal with having multiple, different regimes in the time series data?”

We don't accidentally want to check our model’s performance on a single or a just a handful of regimes!

“How do we simulate how the model could have performed in production, when re-trained at regular intervals?”

My targets’ behaviour is never stationary, and we need our models to adapt to the new regimes!

“How do we deal with the cold start problem: having too little historical data?”

Could we turn the vast majority of the data (80-90%) into out-of-sample test set(s)?

Fig 1: Regime changes are inherent to any Complex Dynamical System.

When doing Time Series Cross-Validation, you leverage the temporal structure of your data: split it into many parts (folds), and train multiple models as you “walk forward” in time, on historical data.

Fig 2: “Walking forward” in time

You can guarantee that these models have not seen the future, and evaluate how they would have performed in the past.

That's nice in theory, what is it like in practice?

If you have tried Time Series Cross-Validation with today’s popular Python libraries, you were probably surprised: they don’t do what you expect.

These are the predictions of a simple ARIMA(1,1,0) model - returned by SKTime’s cross validation function - plotted against the target:

Fig 3: Models in current libraries can’t be updated within a single fold.

Something looks wrong: the model is not updated within a window - essentially repeating the last value for the next 100 timestamps. It doesn’t have access to the latest reading, it is static. Let’s explore what we can do about it - you will see that overcoming this problem was impossible within reasonable computational complexity before.

Introducing Continuous Validation

You could update the models in an “online” manner, within a test window, on each timestamp:

Fig 4: Continuous Validation: Time-Series Cross-Validation with online models.

We call this “Continuous Validation”.

Practically, it’s the way to simulate live deployment of your models (periodically re-trained), on past data.

With our library fold (github), a simple ARIMA(1,1,0) will predict:

Fig 5: Red line is our target: y, blue line is the prediction: preds of the ARIMA(1,1,0) model

instead of the previously shown:

Fig 6: Predictions of sktime (long-fh)

Why create a new library, can I not use an existing one?

You can also try setting the gap between the windows (step size and forecasting horizon) to 1, with SKTime (or other time series libraries), but then you are essentially training a new model for every timestamp.

Why? The vast majority of models can not be updated in existing time series libraries, which will make this task endlessly slow.

We can quantify “endlessly slow” for you:

Table 1: Test results are in seconds, sample_size=1000. Time differences increase with sample size. Note that sktime(long-fh) gives incomparable results, as shown in Figure 6 (imprecise predictions).

Check out this notebook for an in-depth comparison:

Notebook

Evaluating Time Series model performance should be fast, easy and accessible.

Speed is an enabler in science. Discovering the additive attribute of logarithms created the opportunity for sailors to discover unexplored lands and for astrologers to calculate the empirical precision of untested theories.

Continuous Validation is essential for our work, but available tools were so slow, they were essentially unusable.

This is where fold comes into the picture.

Fig 7: Fold allows you to run Continuous Validation with speed on your models. In addition its interface allows to create Composite Models with ease.

It builds on top of other great libraries: for example you can throw in a StatsForecast model with an XGBoost and ensemble them, leveraging the advantages of both. Check out the expanding list of models we support here.

How are we achieving the speed-up?

We parallelize as much as possible. Install ray (optionally), and use all cores on your laptop or deploy to a cluster.
We have a way of updating our own, online models in a mini-batch fashion during backtesting. The model parameters stay the same, but we’ll give them access to the last timestamp(s). This means you can - on 20,000 samples, using 10 folds - evaluate a baseline model in under 3 seconds, and a tabular model model in less than a minute.

We also understand that usability comes first, so with fold:

There’s no need to specify a pipeline object, just throw in your models as a list
You can use the familiar scikit-learn-esque API
Easily extended with arbitrary transformations (just pass in a lambda function) and models (supports a wide range of time series and tabular ML models)
You can build arbitrarily complex pipelines

Check out Fold on GitHub. You can find a Technical Walkthrough in the docs or on colab.

If you are enjoying this post on Continuous Validation share it with someone who might be interested or benefits from it:

We are on a mission to improve everyone’s forecasting ability

We believe there is a need for an open-core, transparent software suite for the time series domain, that allows you to make informed decisions about which time series models are appropriate for your use case.

At Dream Faster, we’re building:

A lightweight, model-independent Evaluation library of Time Series predictions that tracks metrics over time (@dream-faster/krisi).
Time Series AutoML without lookahead bias, to get accurate simulated performance on past data that you can rely on.
Re-implementation of new and classic time series models, providing an additional order of magnitude speedup during backtesting (while supporting online updates when deployed).
Training models on multiple time series at the same time, with synchronized Continuous Validation!
Full support for Uncertainty Quantification, making it an internal part of Fold.
And many more…

In the meantime, we’re building a sustainable business to support our efforts. We will continue supporting research with our open core tools, but we’ll charge for production usage. We are structuring Dream Faster to forego the traditional VC blitzscaling phase and focus on providing value, while sustainably growing.

Please, have fun with it and give us feedback!

In addition, follow our newsletter for an upcoming post on why Ensemble models matter:

This post first appeared on Applied Exploration - Machine Learning, please read the originial post: here

People also like

The Ultimate Guide to Cloud Gaming: Discover the Best Services

Back to the Future with Time Series Continuous Validation

Related Articles

Why are Time Series ML models disappointing today?

That's nice in theory, what is it like in practice?

Introducing Continuous Validation

Why create a new library, can I not use an existing one?

Evaluating Time Series model performance should be fast, easy and accessible.

We are on a mission to improve everyone’s forecasting ability

Back to the Future with Time Series Continuous Validation

Related Articles

Why are Time Series ML models disappointing today?

That's nice in theory, what is it like in practice?

Introducing Continuous Validation

Why create a new library, can I not use an existing one?

Evaluating Time Series model performance should be fast, easy and accessible.

We are on a mission to improve everyone’s forecasting ability

Share the post

Subscribe to Applied Exploration - Machine Learning

Thank you for your subscription