Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Four Deep Learning Papers to Read in January 2022

From Bootstrapped Meta-Learning to Time Series Forecasting with Deep Learning, the Relationship between Extrapolation & Generalization and Exploring Diverse Optima with Ridge Rider

Welcome to the January edition of the ‚Machine-Learning-Collage‘ series, where I provide an overview of the different Deep Learning research streams. So what is a ML collage? Simply put, I draft one-slide visual summaries of one of my favourite recent papers. Every single week. At the end of the month all of the resulting visual collages are collected in a summary blog post. Thereby, I hope to give you a visual and intuitive deep dive into some of the coolest trends. So without further ado: Here are my four favourite papers that I recently read and why I believe them to be important for the future of Deep Learning.

‘Bootstrapped Meta-Learning’

Authors: Flennerhag et al. (2021) |  Paper

One Paragraph Summary: 

ML-Collage [37]: Figures by the author. |  Paper

‘N-Beats: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting’

Authors: Oreshkin et al. (2020) |  Paper |  Code

One Paragraph Summary: Traditional time series forecasting models such as ARIMA come from the world of financial econometrics and rely on fitted moving averages for trend and seasonality components. They tend to only have few parameters, while maintaining clear interpretability. Recently hybrid models, which combine recurrent neural networks with differentiable forecasts have become more and more popular. This allows for flexible function fitting, while maintaining the inductive biases of more classic approaches. But is it also possible to train competitive forecasters, which are based on pure Deep Learning approaches? In N-Beats the authors introduce a new network architecture for univariate time series forecasting, which establishes a new SOTA on the M3, M4 & tourism benchmark. The architecture consists of multiple stacks of residual blocks, which simultaneously perform both forecasting and backcasting. The partial forecasts of the individual stacks are combined into the final prediction for the considered time horizon. Furthermore, the basis of the individual block predictions can either be learned or fixed to a suitable and interpretable functional form. This can for example be low-dimensional polynomials to capture a trend or periodic functionals for seasonal components. The authors combine their approach with ensembling techniques merging models trained on different metrics, input windows and random initialisations. They additionally show that the performance gains saturate as more stacks are added and visually analyse that the fixed basis stack predictions are indeed interpretable.

ML-Collage [38]: Figures by the author. |  Paper

‘Learning in High Dimension Always Amounts to Extrapolation’

Authors: Balestriero et al. (2021) |  Paper |  Podcast

One Paragraph Summary: Can neural networks (NNs) only learn to interpolate? Balestriero et al. argue that NNs have to extrapolate in order to solve high dimensional tasks. Their reasoning relies on a simple definition of interpolation, which is to say that it occurs whenever a datapoint falls into the convex hull of the observed training data. As the dimensionality of the raw input space grows linearly, the volume of this space grows at an exponential rate. We humans struggle with the visualisation of the geometric intuition beyond 3D spaces, but this phenomenon has been commonly known as the curse of dimensionality. But what if the data lies on a lower dimensional manifold? Is it then possible to circumvent the curse of dimensionality and to obtain interpolation with only a few samples? In a set of synthetic experiments the authors show that what actually matters is not the raw dimension of the manifold but the so-called intrinsic dimension — i.e. the smallest affine subspace containing the data manifold. They show that for common computer vision datasets; the probability of a test set sample to be contained in the convex hull of the training set decreases rapidly as the number of considered input dimensions increases. The authors also highlight that this phenomenon is present for neural network embeddings or different dimensionality reduction techniques. In all cases the interpolation percentage decreases as more input dimensions are considered. So what can this tell us? In order for NNs to succeed at solving a task, they have to operate in the “extrapolation” regime! But not all of them generalise as well as others. So this opens up new questions about the relationship between this specific notion of extrapolation and generalisation more generally. What roles do data augmentation and regularization play for example?

ML-Collage [39]: Figures by the author. |  Paper

‘Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian’

Authors: Parker-Holder et al. (2020) |  Paper |  Talk |  Code

One Paragraph Summary: Modern deep learning problems often have to deal with many local optima. Gradient descent has been shown to be biased towards simple high curvature solutions. Classic examples of this problem include shape versus texture optima in computer vision or self-play policies that do not generalise to new players. In which local optimum the optimization procedure ends up, may depend on many arbitrary factors such as the initialisation, data-ordering or details such as regularization. But what if instead of trying to obtain a single optimum, we rather aim to simultaneously explore a diverse set of optima. The Ridge Rider algorithm aims to do so, by iteratively following the eigenvectors of the Hessian with negative eigenvalues — the so-called ridges. The authors show that this procedure is locally loss reducing as long as the eigenvectors smoothly vary along the trajectory. By following these different ridges, ridge rider is capable of covering many different local optima in the contexts of tabular RL and MNIST classification. The authors show that Ridge Rider can also help in discovering optimal zero-shot coordination policies without having access to the underlying problem symmetries. In summary, Ridge Rider turns a continuous optimization problem into a discrete search over the different ridges. It opens up a promising future direction for robust optimization. But there also remain many open questions with regards to the scalability of the method including efficient eigendecomposition and simultaneous exploration of multiple eigenvectors.

ML-Collage [40]: Figures by the author. |  Paper

This is it for this month Let me know what your favourite papers have been. If you want to get some weekly ML collage input, check out the Twitter hashtag #mlcollage and you can also have a look at more collages in the last summary blog post:

Original Source

The post Four Deep Learning Papers to Read in January 2022 appeared first on Big Data, Data Analytics, IOT, Software Testing, Blockchain, Data Lake - Submit Your Guest Post.



This post first appeared on Submit Guest Post - AI, Big Data, Software Testing, please read the originial post: here

Share the post

Four Deep Learning Papers to Read in January 2022

×

Subscribe to Submit Guest Post - Ai, Big Data, Software Testing

Get updates delivered right to your inbox!

Thank you for your subscription

×