Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Understanding ARIMA time series analysis with R (part 1)

Part 1 : Discuss under the stationarity

The time-series processing is frequently needed in the practical analysis, and that skill could help you solve many real problems.

For example, almost all marketing or social analysis will have some kind of seasonality. Even if you analyze your application's access log, it might have some "peak time" (trends of access frequency).

Recognizing time-series analysis is based on the mathematical background (proof, theorem, etc). However, throughout in this post, I focus on explaining why things are needed on building your intuitions. (For a lot of descriptions, I will write without proofs.) I also won't use the difficult terminologies as possible, but I note that the primary mathematical background might be needed for this reading.

Before starting, we have some notice to say.
First, the time-series analysis is having several approaches like the spectrum analysis (Fourier Transform), exponential smoothing, etc... Here we discuss using ARIMA which is based on the mathematical model representing many aspects of real time-series.
The second, there're so many things (a lot of backgrounds) to say and I cannot write it once. Therefore, first, I focus on the fundamentals about ARIMA under the condition of stationary process in this post, and next we will discuss more advanced topics (including non-stationarity) one by one in this blog series.

Note : In this post, we don't care the index of time, like day of week, quarter, etc. (We use like 1, 2, 3, ... as time axis.)
If you want to handle time signature, the timekit package can handle this kind of data mining for time-series analysis.

ARMA Modeling

As I mentioned above, here we assume the stationarity of time-series data. The "stationarity" means : the mean (expected value) is not dependant for time and is constant, and the autocovariance only depends on the time differences. That is, the stationry process has the following restriction.
(One important notice is that the real time-series data might be measured once and the values are fixed. But here we should consider the mathematical model to fit our time-series data. Therefore is not some fixed values, but consists of the statistical probability space.)

Note : Strictly speaking, this definition is called "weak-sense stationary process". This definition is often used in statistical analysis rather than "strongly stationary process".  When we use the term "stationarity", it means "weak-sense stationarity" in this post.
See "Stationary process (Wikipedia)".

For example, if the trend of values increases as time passes, it is not the stational process, because the mean is not constant. (In this case, the differences might behave some steady model, and we discuss this case in ARIMA model.)

Now we consider the mathematical model to fit our time-series data under the condition of stationarity. Let's consider the stock price data.
As you know, the stock price is influenced by the yesterday's price. If the price (close price) of t - 1 is x dollars, the price of day t starts with x dollars.
That is, we can assume that the price will be determined by the following model, where c and φ is constant, and is randomly distributed noise data called "white noise". (For this , both the mean and the autocovariance equal to 0. But it's okay to assume as  to make things easy.)

Note : If φ is negative value, and is having the correlation of the reverse. (When is large, will be negatively large.) If then, the graph will be so jagged.

This model is called AR (Autoregressive), and AR(p) is given as the following definition in general. (Therefore the above stock model is AR(1).)

AR model can represent many aspects of cyclic time-series.
For example, the following R program is plotting AR model with . As you can see, it draws the beautiful line with cycles.


Note : You can simply use arima.sim() (in "forecast" package) for ARIMA (including ARMA) simulation. But here we don't use this function for your understanding.

You remember that not all AR expression is stationary process, and AR model has the stationarity, only if some condition is established.
In AR(1), it has stationarity if and only if . In AR(2), it has stationarity if and only if and .

Now let's consider some other factors influencing time series data.
For stock example, it might also be affected by the difference between open price and close price on the day before. If there was a rapid decline in stock price on the day before, the price might rebound today.
In such a case, we can use the following model called MA (moving average).

The part of (μ is the mean) is the same like AR, but MA depends on the t-1 noise , not .

Same as AR, the general MA(q) is given as follows.
It is known that MA(q) is always stationary process.

Let's see how to change with θ values. As you can see below, it will smoothly transit when θ is positively related, and it will be jagged when it is negatively related.

Normal Distribution (No MA)

MA(3) with

MA(2) with


Finally, the combined model is called ARMA model, and it's given as follows.
As you can see, the former part is AR(p) and the latter is MA(q).

Choose your model and parameters

Now we consider the prediction of time-series data. First, we select the appropriate model and parameters (p, q, θ , φ , etc).

When you select the model (AR, MA, ARMA) and dimensions (p, q), you must focus on the charactristics of each models. Each model is having the different characteristics for Autocorrelation (ACF) and partial autocorrelation (PACF).

Autocorrelation (Corr) is the autocovariance divided by variance as follows. (By dividing with variance, we can eliminate the influence of the size of unit.)
As you know, Corr only depends on the lag k (not depend on t), because Cov depends only on k. That is, the autocorrelation means the effect (correlation) by the lag k,

On the other hand, the partial autocorrelation is the value of the effect by the lag k but it eliminates the effect of the lag 1, 2, ..., k-1. It is given by calculating the linear equations for the covariance. (It needs the calculation of matrices, but we don't describe the definition of the partial autocorrelation in this post.)

Each AR(p), MA(q), and ARMA(p, q) model is having the following different characteristics for the autocorrelation and partial autocorrelation.
For example, if the partial autocorrelation of samples equals to 0 when the lag >= 3, there is high probability that the model is AR(2).

  • AR(p) : When the lag is getting large, the autocorrelation decreases exponentially (but, non-0 value). When the lag is larger than p, the partial autocorrelation equals to 0.
  • MA(q) : When the lag is larger than q, the autocorrelation equals to 0. When the lag is getting large, the partial autocorrelation decreases exponentially (but, non-0 value).
  • ARMA(p, q) : When the lag is getting large, both autocorrelation and partial autocorrelation decreases exponentially (but, non-0 value).
autocorrelation partial autocorrelation
AR(p) decrease exponentially 0 if >= p + 1
MA(q) 0 if >= q + 1 decrease exponentially
ARMA(p, q) decrease exponentially decrease exponentially

Note : You may wonder... Why the partial autocorrelation of MA is not 0, even though Corr is 0 when the lag >= q + 1 ?
The reason is because of the difference between definition of autocorrelation and partial autocorreletion, and it's not so important.

The following is the autocorrelation and partial autocorrelation plotting sample with R. (The x axis is lags, and the y axis is correlation values)



Autocorrelation sample of AR(2)

Partial Autocorrelation sample of AR(2)


Autocorrelation sample of MA(2)

Partial Autocorrelation sample of MA(2)

Note : As you can see, it's so hard to distinguish whether it equals to 0 or is decreasing exponentially, because the values are approximated.
In the real estimation, AIC or BIC (information criteria) is often used for validating the choice (the count of parameters) of models.

For the estimation of parameter values, you can use the estimation techniques like moment, least squares, or maximum likelihood. For example, using the given samples , you can determine the parameter values (, etc) which maximize the likelihood of . (We assume  represents the normal distribution.)

After you determine the model and parameters, you can survey the sample (the given real data) with the normal techniques of the statistical sample survey.

With R, you can analyze without any difficulties using auto.arima() in "forecast" package. The following is the brief example, in which we create the data with AR(2) () and analyze with auto.arima().
As you can see below, we can get the good result (). (Here ARIMA(p, 0, q) means ARMA(p, q).)


# create time-series data with ARMA(2, 0)

Let's consider another example with R.
The following data aligns with sin(), and it will be easy to analyze with spectrum technique, but here we get the model with auto.arima().


data_count = 200

The following is the ARMA simulation with these parameters. (As I described above, you can use arima.sim() for simulation, but here we don't use this function for your understanding.)
As you can see below, this model seems to be fitted closely by the cycle of given data (real data).



These (above) are the primitive examples, but now we change the original data as follows.
This data is much affected by the white noise compared with sin() cycle.

y 10 * sin(y$Indexes / 30)) + (5 * rnorm(y$Noise))

plot(x = y$Indexes, y = y$Values)

The following is the result which we analyze with auto.arima().

When you simulate the result, you can find the difference (errors) between the original data and the simulated one.
As you know, the effect of noise will affect the confidentiality of estimation. (You must care the probability of errors in the real practical analysis.)



Predict the future

When the model and parameters are determined, now you can predict the future values with the generated model.

The predicted values are called "conditional expected value" or "conditional mean", because we predict the future values  with  distribution under the condition of the given values (the values in the past)  . (We assume that represents the normal distribution.)

On the contrary, the prediction approach to show some interval of possibility is called "interval forecast". For example, showing the interval of 95th percentile.
This possibility zone (the variance called mean squared errors (MSE)) is getting larger, when the time goes to the future. With interval forecast, you can visually know these possibility trends.

In computing process, the probability distribution will not be calculated algebraically, but the sufficiently large number of values are generated with the normal distribution, and is approximated. (Because of the computing cost.)
This approach is often used in the real computer system. (For example, the deep neural network also uses this approach instead of the calculation of differentials.)

The following is the R example of this prediction (forecast) with primitive AR(2).
The line is the conditional expected value, inner range is the interval forecast with the possibility of 85th percentile, and outer range is the one with the possibility of 95th percentile.


# create data with AR(2)
data_count # plot with prediction
pred = forecast(

As you can see in the graph, the conditional expected value of AR converges to the expected value (the mean) itself of this model. (MSE converges to the variance.)
Therefore it's meaningless to forecast further future, because it will just be the same value as the expected value (the mean) of the model.

Next we will discuss more advanced topics about time series analysis including non-stationarity.

Share the post

Understanding ARIMA time series analysis with R (part 1)


Subscribe to Msdn Blogs | Get The Latest Information, Insights, Announcements, And News From Microsoft Experts And Developers In The Msdn Blogs.

Get updates delivered right to your inbox!

Thank you for your subscription