In this seminar, we will use Y.RData
file that we created in the Seminar 2. We focus on coding GARCH model using rugarch
package. Please review the theory of GARCH that we discussed in the lecture.
The plan for this week:
1. Build univariate GARCH models
2. Plot the outputs
3. Work with various GARCH specifications
4. Relax the GARCH stationarity condition
5. Assess model quality using likelihood ratio tests and residual analysis
6. Learn about half-life and GARCH simulations
The following sections and subsections are materials for students to learn by themselves: Other GARCH specifications, GARCH Stationarity and Half-life
library(rugarch)
library(tseries)
In this class we will work with the GARCH family, which was originally proposed in 1982 by Robert Engle. Most volatility models used in industry nowadays derive from this formulation.
Returns have a conditional distribution, which we will initially assume to be normal:
$$Y_{t} \sim \mathcal{N}\left(0,\sigma_t^2\right)$$Which can be written as: $Y_t = \sigma_{t}Z_{t}$, where $Z_{t} \sim \mathcal{N}\left(0,1\right)$. $Z_t$ is called residual.
The $\textrm{GARCH}\left(L_1,L_2\right)$ model is:
$$\sigma_t^2 = \omega + \sum_{i=1}^{L_1}\alpha_{i}Y^2_{t-i} + \sum_{j=1}^{L_2}\beta_{j}\sigma^2_{t-j}$$Where:
$\alpha$ is news.
$\beta$ is memory.
The size of $\left(\alpha + \beta\right)$ determines how quickly the predictability of the process dies out.
We are going to use the J.P. Morgan returns from our Y.RData
file to run various GARCH type models discussed in Chapter 2.
# Load the data
load("Y.RData")
# Save the returns in a variable called y
y <- Y$JPM
# Simple plot
plot(y, type = "l", main = "Returns for JP Morgan")
# Autocorrelations
acf(y, main = "Autocorrelations of JPM returns")
acf(y^2, main = "Autocorrelations of JPM returns squared")
The autocorrelation of the returns squared are significant for every lag, meaning that we see a behavior that can be modeled with GARCH.
We will use the rugarch
library for building univariate GARCH models. The documentation for it can be found here: https://cran.r-project.org/web/packages/rugarch/rugarch.pdf
To build a univariate GARCH model, we first specify the type of GARCH model.
We can specify the type of GARCH model we want to fit using the ugarchspec()
function. For example, we can specify the number of lags, if the model includes leverage or power effects, whether the mean should be included or not, and the conditional distribution of the returns. The common syntax we will use inside ugarchspec()
is:
Type of model: variance.model = list(model = "type of model"),
Whether to include or not a mean: mean.model = list(armaOrder = c(0,0), include.mean = FALSE),
Type of conditiona distribution: distribution.model = "std"
To get a detailed explanation on how to use all the options of ugarchspec()
, you can type ?ugarchspec
in the console.
First we will run the default version, which is a conditionally normal GARCH(1,1) model with non-zero mean which follows an ARMA(1,1) process:
# Create the specifications
default_spec <- ugarchspec()
# We can call the variable to see what's inside
default_spec
Once we have specified the model, we can fit our data into it using ugarchfit()
:
# Fit the model to the data using ugarchfit
default_garch <- ugarchfit(spec = default_spec, data = y)
# Call the variable
default_garch
The function provides comprehensive information on the model fit. It includes the optimal parameters estimates, standard errors, tests for significance, likelihood, information criteria, and several other tests.
Since this is too much information in the same place, we can tell R what particular information we are interested in. By using the @
symbol we can extract the contents from an object. For example, by running:
# Check the content of default_garch
names(default_garch@fit)
We see all the content provided by the ugarchfit()
function. We can access any element of this list by using the $
sign:
# Extract a particular element
default_garch@fit$coef
Another way of accessing this information is:
# Extracting coefficients with coef()
coef(default_garch)
Let's say we are particularly interesting in $\omega$. We can access it by:
# Accessing omega using its index
coef(default_garch)[4]
However, using the name instead of the index can prevent potential mistakes in our code (for example, if the package is updated and the list reordered):
# Accessing omega using its name
coef(default_garch)["omega"]
We can view the likelihood of the fitted model with:
likelihood(default_garch)
A very useful feature of rugarch
package is plotting. If we call the function plot()
on a fitted GARCH model like default_garch
, the following interactive menu will appear:
Make a plot selection (or 0 to exit):
1. Series with 2 Conditional SD Superimposed
2. Series with 1% VaR Limits
3. Conditional SD (vs |returns|)
4. ACF of Observations
5. ACF of Squared Observations
6. ACF of Absolute Observations
7. Cross Correlation
8. Empirical Density of Standardized Residuals
9. QQ-Plot of Standardized Residuals
10. ACF of Standardized Residuals
11. ACF of Squared Standardized Residuals
12. News-Impact Curve
We can visualize any of the plots by selecting the corresponding number. The plots for the default_garch
are (The plot is different for different time span):
These plots provide us with plenty of information to analyze the fit and results of our model. For example, from Plot 8 and Plot 9 we can see that the distribution of residuals in our model does not correctly fit under a normal distribution. As expected, there is an excess of observations around the mean in the histogram in Plot 8, and in Plot 9 we see the presence of fat tails in our observations.
A key output of the GARCH model is the estimated conditional volatities, the sequence $\sigma_t$. This sequence is used for generating the first three plots shown above. We can access the sequence through default_garch@fit$sigma
.
# Plotting the estimated conditional volatility
plot(default_garch@fit$sigma, type = "l", main = "Esimated conditional volatility for JPM")
So far we have used the default specifications of ugarchspec()
. Now we will build different models of the GARCH family using the same data.
The default method assumes that the mean of the GARCH process follows an ARMA(1,1) process. For the purpose of this class, we will use the assumption that the mean of the returns is equal to zero. To specify we don't want to include a mean in the model, we have to change the options in mean.model
:
# New specification
spec1 <- ugarchspec(
variance.model = list(garchOrder = c(1,1)),
mean.model = list(armaOrder = c(0,0), include.mean = FALSE)
)
# Fit the model
GARCH <- ugarchfit(spec = spec1, data = y)
We can call the object to see the output. We will see that the coefficients now only include $\omega, \alpha_1, \beta_1$
coef(GARCH)
An $\textrm{ARCH}\left(L_1\right)$ model is the restricted version of $\textrm{GARCH}$. Conditional volatility is modelled as a weighted average of past returns:
$$\sigma_t^2 = \omega + \sum_{i=1}^{L_1}\alpha_{i}Y^2_{t-i}$$We can easily fit our data to an $\textrm{ARCH}\left(L_1\right)$ model by setting the second parameter in the $\textrm{GARCH}$ to $0$.
Models from the GARCH family are fitted using a Maximum Likelihood method, which requires a solver. We recommend using solver = "hybrid"
if the algorithm fails to converge:
# Specification for ARCH(1)
spec2 <- ugarchspec(
variance.model = list(garchOrder = c(1,0)),
mean.model = list(armaOrder = c(0,0), include.mean = FALSE)
)
# Fit to the data
ARCH <- ugarchfit(spec = spec2, data = y, solver = "hybrid")
coef(ARCH)
To take fat tails into account in our modelling, we can change the assumptions for the conditional distribution of the returns. The Student-t distribution is a natural choice since it shows fatter tails than the Normal distribution. This can be very useful in risk forecasting, but it requires at least 3,000 observations for a correct estimation.
For a Student-t GARCH we will use the assumption:
$$Z_t \sim t_{\left(\nu\right)}$$Where $\nu$ is the degrees of freedom of the Student-t.
To implement it using R we use the distribution.model
in ugarchspec()
:
# Specify tGARCH
spec3 <- ugarchspec(
variance.model = list(garchOrder = c(1,1)),
mean.model = list(armaOrder = c(0,0), include.mean = FALSE),
distribution.model = "std"
)
# Fit the model
tGARCH <- ugarchfit(spec = spec3, data = y)
When calling tGARCH
, we will see that there is a new parameter called shape
, which is the estimation of the degrees of freedom of the distribution:
coef(tGARCH)
To see how the tGARCH
fits our data better than the GARCH
model, we can see Plot 8 and Plot 9 for tGARCH
:
In Plot 8 we can see how the Student-t distribution is a better fit for our data compared to the Normal distribution. It has a higher peak around zero, which matches what we see in financial data. In Plot 9, the qqPlot, we see a better fit for extreme values, due to the fat-tailed nature of the Student-t distribution.
The $\textrm{APARCH}$ model takes into account both leverage effects and power effects. A leverage effect can be present since stock returns are sometimes negatively correlated with changes in volatility. Power models take into consideration that absolute returns might have stronger autocorrelations than squared returns. APARCH takes this into account. Here is the model with one lag:
$$\sigma^2_t=\omega+\alpha \ \left(\mid Y_{t-1}\mid - \ \zeta \ Y_{t-1} \right)^{\delta}+ \beta \ \sigma^{\delta}_{t-1}$$The model allows for leverage effects when $\zeta \neq 0$ and power effects when $\delta \neq 2$. This model is usually difficult to estimate since it requires at least 4,000 observations.
We can specify this model in the variance.model
option in ugarchspec()
:
# Specify APARCH model
spec4 <- ugarchspec(
variance.model = list(model = "apARCH"),
mean.model = list(armaOrder = c(0,0), include.mean = FALSE)
)
# Fit to the data
apARCH <- ugarchfit(spec = spec4, data = y)
In the coefficients we will find the estimation for $\gamma$and $\delta$
# View the coefficients
coef(apARCH)
We can include a Student-t distribution in the APARCH model to account for fat tails:
# Specify tapARCH model
spec5 <- ugarchspec(
variance.model = list(model = "apARCH"),
mean.model = list(armaOrder = c(0,0), include.mean = FALSE),
distribution.model = "std"
)
# Fit to the data
tapARCH <- ugarchfit(spec = spec5, data = y)
coef(tapARCH)
The news impact curve measures how new information affects the return volatility in GARCH models. Intuitively, it tells us how a shock to returns in period $t-1$ will affect volatility in period $t$. For standard GARCH models, it has a symmetric U-shape, the reason being that for the estimation we are using returns squared, so it won't matter if the news are positive or negative.
Asymmetric GARCH models like the apARCH react differently if the news are positive or negative. Generally, negative news increase volatility more than positive news, so the U shape will not be symmetrical. The plot below shows the news impact curve for the GARCH, tGARCH, apARCH and tapARCH models we created:
The unconditional volatility for a GARCH(1,1) model is:
$$\sigma^2 = \frac{\omega}{1-\alpha-\beta}$$To ensure positive volatility forecasts, we need:
$$\omega, \alpha, \beta \geq 0$$It is a common question if we should impose $\alpha + \beta < 1$, which determines if the process has variance stationarity or not. This is not advisable except in special circumstances for two reasons:
By default, ugarchfit()
imposes stationarity in the model, but we can relax this assumption using the option: fit.control = list(stationarity = 0)
. For example, we will use a subset of the data to fit a stationary and a non-stationary model:
# Subset the data
subset <- y[4000:5000]
spec <- ugarchspec()
# Stationary GARCH - Default
stationary_GARCH <- ugarchfit(spec = spec, data = subset)
# Non-stationary GARCH - using fit.control = list(stationarity = 0)
non_stationary_GARCH <- ugarchfit(spec = spec, data = subset, fit.control = list(stationarity = 0))
# Compare alpha + beta
cat("Stationary alpha + beta:", sum(coef(stationary_GARCH)[c("alpha1", "beta1")]), "\n",
"Nonstationary alpha + beta:", sum(coef(non_stationary_GARCH)[c("alpha1", "beta1")]))
If we plot the fitted conditional volatilities we will see that they look very similar:
plot(stationary_GARCH@fit$sigma, type = "l", col = "black")
lines(non_stationary_GARCH@fit$sigma, col="red")
legend("topleft", legend = c("Stationary", "Non-stationary"), lty = 1, col = c("black", "red"))
However, they are not exactly the same:
# Create a new variable with the absolute difference between stationary and non-stationary
dif <- abs(stationary_GARCH@fit$sigma - non_stationary_GARCH@fit$sigma)
# Plot it
plot(dif, type = "l", main = "Distance in Stationary and Non-stationary models cond volatility")
The distance between the estimated conditional volatility of the stationary and non-stationary model resembles the shape of the estimated conditional volatility.
The following table summarizes the coefficients for the models we have built:
ARCH | GARCH | tGARCH | apARCH | tapARCH | |
---|---|---|---|---|---|
$\omega$ | 0.00031 | 2.20394e-06 | 2.61938e-06 | 6.59773-05 | 7.09693e-05 |
$\alpha$ | 0.49309 | 0.06364 | 0.07995 | 0.07320 | 0.08147 |
$\beta$ | - | 0.93288 | 0.91809 | 0.93071 | 0.92659 |
$\nu$ | - | - | 6.10894 | - | 6.48193 |
$\gamma$ | - | - | - | 0.47001 | 0.48501 |
$\delta$ | - | - | - | 1.29540 | 1.25297 |
Consider two models, where one is nested inside the other. This can be the case of the $\textrm{ARCH(1)}$, which is a subset of a $\textrm{GARCH(1,1)}$. We can call the nested model restricted (R), and the other unrestricted (U).
By definition: $$\log \mathcal{L}_R \leq \log \mathcal{L}_U$$
Two times the difference between the unrestricted and restricted likelihood is distributed as a $\chi^2$ with degrees of freedom equal to the number of restrictions:
$$\textrm{LR} = 2\left(\log \mathcal{L}_U - \log \mathcal{L}_R\right) \sim \chi^2_{\left(\textrm{number of restrictions}\right)}$$The number of restrictions means how many parameters we are excluding in the restricted model. For example, if we consider the ARCH vs GARCH models:
# Number of restrictions ARCH vs GARCH
coef(ARCH)
coef(GARCH)
The ARCH model, nested inside the GARCH model, has one parameter less. So in this case the number of restrictions which serves as the degrees of freedom in the chi-square distribution, would be one.
In the case of the ARCH vs the tGARCH model:
# Number of restrictions ARCH vs tGARCH
coef(ARCH)
coef(tGARCH)
The ARCH model has two parameters while the tGARCH has four, so we would be dealing with two degrees of freedom.
We can easily implement the LR test using the likelihood
output from the ugarchfit()
function.
Let's perform the test for the ARCH(1) and GARCH(1,1) models we created:
# Creating the LR statistic - using cat() for output display
cat(" Likelihood of ARCH: ", round(likelihood(ARCH),2), "\n",
"Likelihood of GARCH: ", round(likelihood(GARCH),2), "\n",
"2 * (log Lu - log Lr): ", round(2*(likelihood(GARCH)-likelihood(ARCH)),2))
The value of the test statistic is 2341. How do we know if this is above the critical value for a Chi-square with one degree of freedom and a confidence level of 95%?
We can use the qchisq()
function to find the 95% quantile of the Chi-square.
# 95% quantile of chi-square
qchisq(p = 0.95, df = 1)
The critical value is 3.84, which is smaller than 2341. This means we have enough evidence to reject that the two models perform similarly.
We can also use pchisq()
to find the p-value of the test statistic:
# Adding the p-value
p.value <- 1 - pchisq(2*(likelihood(GARCH)-likelihood(ARCH)), df = 1)
cat(" Likelihood of ARCH: ", round(likelihood(ARCH),2), "\n",
"Likelihood of GARCH: ", round(likelihood(GARCH),2), "\n",
"2 * (Lu - Lr): ", round(2*(likelihood(GARCH)-likelihood(ARCH)),2), "\n",
"p-value:", p.value)
We can create a Test()
function that takes two fitted models and gives an output with the likelihoods and p-values:
# Creating a function for Likelihood Tests
Test <- function(restricted, unrestricted) {
# Specifying the degrees of freedom as the number of restrictions
df <- length(unrestricted@fit$coef) - length(restricted@fit$coef)
# Creating the statistic
lr <- 2*(likelihood(unrestricted) - likelihood(restricted))
# Finding its p-value
p.value <- 1 - pchisq(lr, df)
# Output
cat("Degrees of freedom:", df, "\n",
"Likelihood of unrestricted model:", likelihood(unrestricted), "\n",
"Likelihood of restricted model:", likelihood(restricted), "\n",
"LR: 2*(Lu-Lr):", lr, "\n",
"p-value:", p.value
)
}
Using our new Test()
function, let's test the GARCH and tapARCH models:
# Testing the GARCH and tapARCH
Test(GARCH, tapARCH)
We have assumed returns to be conditionally normally distributed. We can test this with a Jarque-Bera test for normality and a Ljung-Box test for autocorrelation of returns and returns-squared, and do a QQ plot to see the fit.
Let's perform this test with the output of our GARCH(1,1) model:
# Performing residual analysis
residuals <- GARCH@fit$residuals
# Autocorrelation of residuals
acf(residuals, main = "Autocorrelation of residuals")
# Autocorrelation of residuals squared
acf(residuals^2, main = "Autocorrelation of residuals squared")
# Jarque-Bera test for Normality
jarque.bera.test(residuals)
# Ljung-Box test for residuals
Box.test(residuals, type = "Ljung-Box")
# Ljung-Box test for residuals squared
Box.test(residuals^2, type = "Ljung-Box")
# qqPlot for residuals under a normal distribution
qqPlot(residuals)
# qqPlot for residuals under a Student-t with 3 degrees of freedom
qqPlot(residuals, distribution = "t", df = 3, envelope = FALSE)
We are interested in how quickly shocks die out to understand the memory present in the data. The half-life is defined as the number of periods, $n^*$, it takes for conditional variance to revert back halfway towards unconfitional variance:
$$\sigma^2_{t+n^*,t} - \sigma^2 = \frac{1}{2}\left(\sigma^2_{t+1,t}-\sigma^2\right)$$
$$\left(\alpha + \beta \right)^{n^*-1} \left(\sigma^2_{t+1,t}-\sigma^2\right) = \frac{1}{2}\left(\sigma^2_{t+1,t}-\sigma^2\right)$$
$$n^* = 1 + \frac{\log\left(\frac{1}{2}\right)}{\log\left(\alpha + \beta \right)}$$
So, as $\left(\alpha + \beta \right) \rightarrow 1$, $n^* \rightarrow \infty$, the memory is infinite. We can easily calculate the half-life from our model:
# Half-life
# We use unname() to remove the names of the objects
alpha <- unname(coef(GARCH)['alpha1'])
beta <- unname(coef(GARCH)['beta1'])
half_life <- 1 + (log(0.5)/log(alpha+beta))
cat("The Half-life for our GARCH(1,1) is", half_life, "time periods.")
The package rugarch
allows us to use the function ugarchsim()
to simulate series using the fitted model. Since simulations involve the use of random numbers, every simulation will be different from each other (unless we specify seeds for replicability).
As a exercise, we can do 1000 simulations using our GARCH(1,1) model and use the matplot()
function to see how our simulated returns and volatility would look like:
# Create 1000 simulations from our fitted model
simulations <- ugarchsim(GARCH, m.sim = 1000)
# Plot the simulated returns
matplot(simulations@simulation$seriesSim, type = "l",
main = "Simulations of returns")
# Plot the simulated volatilities
matplot(simulations@simulation$sigmaSim, type = "l",
main = "Simulations of volatility")
We can take advantage of ugarchsim()
to do a quick Monte Carlo study. We know the coefficients of our model are:
coef(GARCH)
We can fit a GARCH model to every simulation of returns, store the coefficients, and see how they are distributed and the relationship to the true coefficients. The law of large numbers tells us that for a large number of simulations, the distribution should converge to the true values:
# Number of simulations
sim <- 1000
# Create vectors to hold the fitted parameters
omega_sim <- vector(length = sim)
alpha_sim <- vector(length = sim)
beta_sim <- vector(length = sim)
# Specify the GARCH we will use
spec <- ugarchspec(
variance.model = list(garchOrder = c(1,1)),
mean.model = list(armaOrder = c(0,0), include.mean = FALSE)
)
# Create the simulations
simulations <- ugarchsim(GARCH, m.sim = sim)
# Fit a GARCH for every simulation and store the parameter values
for (i in 1:sim) {
# Fit the GARCH model
temp <- ugarchfit(spec = spec1, data = simulations@simulation$seriesSim[,i])
# Add parameters to vectors created
omega_sim[i] <- unname(coef(temp)['omega'])
alpha_sim[i] <- unname(coef(temp)['alpha1'])
beta_sim[i] <- unname(coef(temp)['beta1'])
}
# Omega
# Distribution
hist(omega_sim, col = "lightgray", main = "Distribution of simulated omega")
# Adding a vertical line on the true parameter
abline(v = coef(GARCH)['omega'], col = "red", lwd = 2)
# Alpha
# Distribution
hist(alpha_sim, col = "lightgray", main = "Distribution of simulated alpha")
# Adding a vertical line on the true parameter
abline(v = coef(GARCH)['alpha1'], col = "red", lwd = 2)
# Beta
# Distribution
hist(beta_sim, col = "lightgray", main = "Distribution of simulated beta")
# Adding a vertical line on the true parameter
abline(v = coef(GARCH)['beta1'], col = "red", lwd = 2)
In this seminar we have covered:
Some new functions used:
ugarchspec()
ugarchfit()
ugarchsim()
coef()
likelihood()
cat()
function()
For more discussion on the material covered in this seminar, refer to Chapter 2: Univariate volatility modeling on Financial Risk Forecasting by Jon Danielsson.
Acknowledgements: Thanks to Alvaro Aguirre and Yuyang Lin for creating these notebooks
© Jon Danielsson, 2022