Time Series Analysis 10 | SARIMAX and VAR Models

Series: Time Series Analysis

Time Series Analysis 10 | SARIMAX and VAR Models

  1. Multivariate Time Series

(1) The Definition of the Multivariate Data

Multivariate data are multivariate variables collected with the same time steps that have a relationship or correlation with each other.

(2) Two Types of Multivariate Time Series Models

  • Exogenous Models: one response variable and the other variables are exogenous features. For example, the number of people taking the bus daily can be a response variable of the exogenous variable of daily weather. In this case, exogenous variables are able to impact the response variable, but not the other way around. An example of this is the SARIMAX model.
  • Endogenous Models: all the variables are endogenous and they will influence each other. So we have to model them simultaneously. An example of this is the VAR (i.e. Vector AR(p)) model.

(3) SARIMAX Model

Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors (aka. SARIMAX) is an extension of the ARIMA class of models, which has one responsible variable Y_t and we will have exogenous variables X_1t , X_2t , … , X_kt . So Y_t is related to both the exogenous variables (which can be modeled by a linear model) and its historical data (which can be modeled by a SARIMA model).

So for the set up of the model, we will use the ARMA model plus the linear model, and expression of a SARIMAX model should be,

(4) Fitting SARIMAX Model

In practice, we will use a training set of time t from 0 to n for fitting the model. The training set of exogenous should be,

Y_{t-1}    =    y_0    , y_1    , ... , y_{n-1}
X_{1,t} = x_{1,0}, x_{1,1}, ... , x_{1,n-1}
... = ...
X_{k,t} = x_{k,0}, x_{k,1}, ... , x_{k,n-1}

And the training set of response variables should be,

Y_{t} = y_1, y_2, ... , y_{n}

Then, we will also have a testing set of time t from n+1 to n+h for forecasting,

Y_{t}      =    y_0       , y_1       , ... , y_{n}
X_{1,t+h} = x_{1,n+1} , x_{1,n+2} , ... , x_{1,n+h}
... = ...
X_{k,t+h} = x_{k,n+1} , x_{k,n+2} , ... , x_{k,n+h}

And the forecasted response variables of the testing set should be,

Y_{t}-hat = y_{n+1}-hat, y_{n+1}-hat , ... , y_{n}-hat

(5) VAR Model

Vector autoregression (aka. VAR or Vector AR(p)) model is a statistical model used to capture the relationship between multiple quantities as they change over time. Suppose we have the multivariate time series data with time t from 0 to n-1 as follows,

Y_1  =    y_{1,0}  ,  y_{1,1}  ,  ...  ,  y_{1,n-1}
Y_2 = y_{2,0} , y_{2,1} , ... , y_{2,n-1}
...
Y_k = y_{k,0} , y_{k,1} , ... , y_{k,n-1}

Instead of analyzing them separately, we can put them together in a vector case. So that the data can be represented as,

We use an AR(p) model instead of an ARMA model here for simplicity purposes. Generally, we can write this expression as,

Then the estimated Y_{t}-hat can be expressed as

where A_1 to A_p are the coefficient matrix of this model.

(6) An Example of VAR Model

Now let’s see an example. Suppose we have two variables Y_1 and Y_2 for VAR(p) model, then we will have,

where the subscripts of ϕ are [# of row, # of col, seasonality lag h]

This can be written as,

(7) VAR Model Evaluation

Even though VAR models are good to address endogenous relations and forecasting, it has some drawbacks.

  • VAR model assumes that all the variables are stationary. If some of the variables are not stationary, we need to difference them and then transform them back afterward.
  • We assign the same p for all the variables, which may simply not be the case.