# 12  When things go wrong in univariate volatility estimation

The estimation in the previous chapter went through without any problems, but we are not always so lucky. Since we are maximising a non-linear function, there are many things that can go wrong, often with hard to diagnose problems, as the code can fail with an incomprehensible error message. There can be many plausible explanations. They might be errors in data, the algorithms might be buggy, or you code might have mistakes.

These problems can arise because of the trade-off between safe code and speed. If we want to ensure that nothing goes wrong, we need a lot of checks and robust algorithms, which is is slow, even very slow. Consequently, we opt for algorithms that are fast, usually work but then sometimes fail.

## 12.1 Libraries

We use two libraries, the rugarch and nlopt which is a library for nonlinear optimisation.

source('common/functions.r')
suppressPackageStartupMessages(library(rugarch))
library(nloptr)
source('common/common.r')

### 12.1.1 Define rugarch specs

spec.g = ugarchspec(
variance.model = list(garchOrder = c(1,1)),
mean.model = list(armaOrder = c(0,0), include.mean = FALSE)
)
spec.t = ugarchspec(
variance.model = list(garchOrder = c(1,1)),
mean.model = list(armaOrder = c(0,0), include.mean = FALSE),
distribution.model = "std"
)

## 12.2 Data

The data we use is returns on the S&P500 index, and for convenience, put them into variable y.

load('data/data.RData')
y=data$sp500$y
y=y-mean(y)

## 12.3 Failure

When I tried to estimate the tGARCH model over a particular 2000 day estimation window, y[720:2719] the estimation failed. Other estimation windows also failed. This happens on the computer I’m using now, and the estimation might be successful on a different computer, especially if it has a different version of R or other key libraries.

Here is what we do:

Res = ugarchfit(spec = spec.t, data = y[720:2719])

When we run this we get the following error message:

Error in robustvcv(fun = f, pars = ipars[estidx, 1], nlag = nlag, hess = fit$hessian, : object 'B' not found Traceback: 1. ugarchfit(spec = spec.t, data = data) 2. ugarchfit(spec = spec.t, data = data) 3. .sgarchfit(spec = spec, data = data, out.sample = out.sample, . solver = solver, solver.control = solver.control, fit.control = fit.control, . numderiv.control = default.numd) 4. .makefitmodel(garchmodel = "sGARCH", f = .sgarchLLH, T = T, m = m, . timer = timer, convergence = convergence, message = sol$message,
.     hess = hess, arglist = arglist, numderiv.control = numderiv.control)
5. robustvcv(fun = f, pars = ipars[estidx, 1], nlag = nlag, hess = fit$hessian, . n = T, arglist = arglist) ### 12.3.1 Why does it fail? This error message might seem incomprehensible, but there are some clues in it. It fails in something called robustvcv, not something we need for our purpose, which is simply one day ahead volatility forecast, but it is required for getting the confidence bounds of the parameters. In particular, vcv relates to variance-covariance-matrix, and robust to its calculation in a way that is not overly sensitive to distributional assumptions. While I can only guess, I think the error has to do with some overflow of numerical values. Some returns are very small, while others are quite large, and when we take them to the power two, the differences amplify. The covariance matrix is obtained from the inverse hessian, which is the matrix of second derivatives at the top of the likelihood function. It is tricky to estimate this hessian, especially if we want it done quickly. It comes down to limited numerical precision. Most calculations are done with what is known as double precision floating point, called float64, a number that uses 64 bits in computer memory. A floating point number looks like $$0.0234=2.34\times 10^{-2}$$ written in code as 2.34e-2. The smallest number is $$2^{-1022}$$ and largest $$2^{1023}$$. There are two parts to a floating point number, the exponent (or power), and mantissa the precision bits of the number, in front of the Power. 52 bits represent the mantissa. This means that if we do calculations with numbers are very different from each other, one can end up with an answer that is very incorrect. For example, if we want to sum up many numbers that are very different in size, you will get a different answer whether we start adding up the smallest numbers or the largest numbers. Also, as we see in Chapter 15 discussing backtests, we can get a different answer depending on if we do a log of a product or the sum of logs. This is what I think happened here, some sort of numerical precision problem. There are some ways to solve it. Some easy, others a bit more complicated. #### 12.3.1.1solver = "hybrid" does not help We could try Res = ugarchfit(spec = spec.t, data = data,solver = "hybrid") but that also fails, for obvious reasons ## 12.4 Re-scaling might work The first thing to try is rescaling the data. ### 12.4.1 De-meaning When we imported the data, we removed the mean, but only from all the observations. If, however, we de-mean the estimation window, the calculation succeeds. data=y[720:2719] data=data-mean(data) Res = ugarchfit(spec = spec.t, data = data,) coef(Res)  omega alpha1 beta1 shape 1.360510e-06 1.013725e-01 8.970915e-01 5.429194e+00  ### 12.4.2 Normalising variance Normalising the variance to be one also works, but then we need to re-scale $$\omega$$ back since $\sigma^2 = \frac{\omega}{1-\alpha-\beta}$ and so if we do data=data/sd(data) then need to do $\hat{\omega}\times var(y)$ data=y[720:2719] scale=1/sd(data) data=data*scale Res = ugarchfit(spec = spec.t, data = data) coef(Res)  omega alpha1 beta1 shape 0.006938875 0.102217111 0.896542052 5.366171288  omega=coef(Res) / scale^2 omega  omega 1.366344e-06  ### 12.4.3 Do both Usually, it is best to do both, both de-mean and normalise ### 12.4.4 Why did re-scaling work? The reason why the de-mean and normalisation works is because it makes most of the numbers be around one which means that when we square them, the difference between the big and the small numbers doesn’t become very large, so we don’t get the numerical overflow we discussed above. ## 12.5 Detecting failures — try() Suppose the rescaling doesn’t work, and the estimation still fails. If doing one estimation, it is easy to spot the problem and do something else. But what if doing backtesting, and we want to generate a large number of one day risk forecasts. Then it becomes annoying to have the code fail. Here a R statement called try() can be particularly useful. try() is designed to catch errors, so you try a calculation, and either you get an answer back, or a message saying the estimation failed. Suppose you want to take a logarithm of x, log(x). the x needs to be positive number, and if you pass a string, the code crashes log("string") Error in log("string"): non-numeric argument to mathematical function But what if you do a try. res1=try(log(1)) res2=try(log("string"),silent = TRUE) Both ran without error, but the second, of course failes. We can see that by res1  0 class(res1)  "numeric" res2  "Error in log(\"string\") : non-numeric argument to mathematical function\n" attr(,"class")  "try-error" attr(,"condition") <simpleError in log("string"): non-numeric argument to mathematical function> class(res2)  "try-error" and detect it by class(res2)=="try-error"  TRUE So after running the try we check the output, and if its class is "try-error" we know the calculation failed and we should deal with that. ## 12.6 Optimisers Recall the discussion in Chapter 11 on how we maximise the likelihood function. The algorithm used to do that maximisation is called an optimiser. The basic R distribution provides a function called optim() for optimising functions. It provides several algorithms, including the very old and robust Nelder-Mead That might not be the fastest algorithm, but does have the advantage of not depending on derivatives, and it will always find the next optimum. Nelder-Mead has at least two disadvantages, it requires more steps to find the optimum than some other algorithms, and it only does local optimisation, meaning it just finds the next peak. There is an alternative library called NLopt that provides a large number of state-of-the-art optimisers. NLopt can be used with almost every programming language, and the R implementation is nloptr. nloptr requires more steps than the built in optim. ### 12.6.1 Optimisers minimise One thing to keep in mind is that by default, optimisers minimise functions instead of maximising as we require here. This means that when we return a functional value, we have to invert the sign before returning. ### 12.6.2 Ensuring positive parameters In our case, the parameters that go into the likelihood function must be positive. There are at least two different ways we can ensure that positivity. The easiest is simply to take the absolute value of the parameters, but we can often also impose constraints when we call the optimising function. We show both below. ### 12.6.3 Optimisaiton in R Suppose we want to minimise some function f(parameters), where we have three parameters. #### 12.6.3.1 Build-in optim The easiest way is to use the built-in optim function, which by default uses the Nelder-Mead algorithm. It returns a list that contains the parameter estimates, the function value at the optimum and some other diagnostics. res = optim(c(0.000001, 0.1, 0.85) ,fn= f) # Optimization #### 12.6.3.2NLopt NLopt is a bit more complicated. We first have to set up control parameters for the estimation, like one of: opts =list("algorithm"="NLOPT_GN_CRS2_LM", "xtol_rel"=1.0e-8, maxeval=1e5) opts =list("algorithm"="NLOPT_GN_DIRECT_L", "xtol_rel"=1.0e-8, maxeval=1e5) opts =list("algorithm"="NLOPT_LN_NELDERMEAD", "xtol_rel"=1.0e-8, maxeval=1e5) Where we see three different algorithm choices. We also set the tolerance xtol_rel and how often the function can be called, maxeval. After that we call the actual optimisation with nloptr. We set the starting values x0, the function name eval_f, the lower and upper bounds on the parameters, lb and ub, and the control list. res = nloptr(x0=c(0.000001, 0.1, 0.85), eval_f=f, lb=c(0,0,0), ub=c(1,1,1), opts=opts) It returns a list that contains the parameter estimates, the function value at the optimum and some other diagnostics. ## 12.7 Coding the likelihood ### 12.7.1 Issues • Value of $$\sigma_1^2$$: We need to assign a value to $$\sigma_1^2$$, the first iteration of the conditional volatility. General, we choose the unconditional variance of the data for this; • Initial values of the parameters: Optimisation works iteratively. You need to assign values to initialise the parameters, making sure these are compliant with the restrictions of the function and the parameter bounds. This will have a small effect on the parameter values, but can have a large effect in computing time. We also need to pick an optimisation method and algorithm. ### 12.7.2 Normal GARCH The likelihood function and its derivation can be found in Chapter 2 of the book and slides $\begin{split} \log\mathcal{L} & = \underbrace{{-\dfrac{T-1}{2}\log(2\pi)}}_{\text{Constant }}\\ & - \dfrac{1}{2}\sum\limits_{t=2}^{T}\left({(\log (\omega+\alpha{y^2_{t-1}+\beta{{\hat\sigma^2_{t-1}})})}} + \dfrac{y^2_t}{\omega+\alpha{y^2_{t-1}}+\beta{{{\hat\sigma^2}_{t-1}}}}\right) \end{split}$ ### 12.7.3 tGARCH The Student-t density is given by $\frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi}\Gamma\left(\frac{\nu}{2}\right)} \left(1+\frac{x^2}{\nu} \right)^{-\frac{\nu+1}{2}}$ We want to make two alterations 1. Make it standardised; 2. Write it in terms of a time changing volatility model, Which gives us $\frac{1}{\sigma_t} \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{(\nu-2)\pi\Gamma\left(\frac{\nu}{2}\right)}} \left(1+\frac{y_t^2}{\sigma_t^2}\frac{1}{\nu-2} \right)^{-\frac{\nu+1}{2}}$ and take logs. There are two parts, the constant and time changing part. The constant is $\log\Gamma\left(\frac{\nu+1}{2}\right)- \frac{1}{2}\log(\nu\pi)-\log\Gamma\left(\frac{\nu}{2}\right)$ while the time changing part is $-\log\sigma_t -\frac{\nu+1}{2}\left( \log\left(\sigma_t^2(\nu-2)+y_t^2\right) -\log\left(\sigma_t^2(\nu-2) \right) \right)$ ## 12.8 Implementation ### 12.8.1 Issues There are three issues that arise. The first is that we have to ensure the parameters are positive, which we do below by par=abs(par) but also by setting lb and ub in nloptr. We don’t need both, but since we are comparing two different ways of doing optimisation keep it in here. Second, we don’t want to repeat calculations. We might end up calling the likelihood function large number of times, so we don’t want to be squaring the returns or counting the elements or taking the variance more often than necessary. So we pre-calculate everything we can, It is especially important not to do any uneeded calculations inside the loops. Finally, we need to get the last volatility. While we can write clever code for that, in this case we are just doing the simple way of putting it into global memory with <<- which is not how one is supposed to do things, but can be very convenient. ### 12.8.2 Normal GARCH y2=y^2 sigma2=var(y) T = length(y2) LL.GARCH = function(par) { par=abs(par) omega = par alpha = par beta = par loglikelihood = 0 for(i in 2:T){ sigma2 = omega + alpha*y2[i-1] + beta*sigma2 loglikelihood=loglikelihood+log(sigma2)+y2[i]/sigma2 } loglikelihood = -0.5* loglikelihood -(T-1)/2 * log(2*pi) sigma2Last <<- sigma2 return(-loglikelihood) } res = optim(c(0.000001, 0.1, 0.85) ,fn= LL.GARCH) res$par=abs(res$par) opts =list("algorithm"="NLOPT_LN_NELDERMEAD", "xtol_rel"=1.0e-8, maxeval=1e4) res = nloptr(x0=c(0.000001, 0.1, 0.85), eval_f=LL.GARCH, lb=c(0,0,0), ub=c(1,1,1), opts=opts) spec.2 = ugarchspec( variance.model = list(garchOrder = c(1,1)), mean.model = list(armaOrder = c(0,0), include.mean = FALSE) ) Res = ugarchfit(spec = spec.2, data = y) cat("Omega:", res$solution, coef(Res),"\n",
"Alpha:", res$solution, coef(Res), "\n", "Beta:", res$solution, coef(Res),"\n")
Omega: 2.459155e-06 2.459018e-06
Alpha: 0.1238671 0.1238915
Beta: 0.8557986 0.8557904 

### 12.8.3 tGARCH

y2=y^2
sigma2=var(y)
T = length(y2)

LL.tGARCH = function(par) {
par=abs(par)
omega = par
alpha = par
beta = par
nu = par
n1=nu+1.0;
n2=nu-2.0;
loglikelihood = 0

for(i in 2:T){
sigma2 = omega + alpha*y2[i-1] + beta*sigma2
loglikelihood=loglikelihood+
log(sigma2)+
n1*(log(n2*sigma2+y2[i])-log(n2*sigma2) )
}
loglikelihood = -loglikelihood/2.0;
loglikelihood=loglikelihood+
(T-1)*(log(gamma(n1/2))-
log(gamma(nu/2))-0.5*log(pi*n2))
sigma2Last<<- sigma2
return(-loglikelihood)
}
res = optim(c(0.000001, 0.1, 0.85,6) ,fn= LL.tGARCH) # Optimization
res$par=abs(res$par)

opts =list("algorithm"="NLOPT_GN_CRS2_LM", "xtol_rel"=1.0e-6, maxeval=1e6)

resn = nloptr(x0=c(0.000001, 0.1, 0.85,6),
eval_f=LL.tGARCH,
lb=c(0,0.02,0.6,2),
ub=c(1e-4,0.2,1,1e3),
opts=opts)

cat("sigma2",sigma2Last,"\n")
sigma2 0.0001619243 
spec.3 = ugarchspec(
variance.model = list(garchOrder = c(1,1)),
mean.model = list(armaOrder = c(0,0), include.mean = FALSE),
distribution.model = "std"
)
Res = ugarchfit(spec = spec.3, data = y)

cat("Omega:", resn$solution, coef(Res),"\n", "Alpha:", resn$solution, coef(Res), "\n",
"Beta:", resn$solution, coef(Res),"\n", "nu:", resn$solution, coef(Res),"\n")
Omega: 1.519132e-06 1.505701e-06
Alpha: 0.1242821 0.1240716
Beta: 0.8703039 0.8706782
nu: 6.067725 6.061799 

## 12.9 Exercise

Apart from rugarch we used in this chapter, there are two popular alternatives in R that estimate times series models, fGarch and tseries. Fit a GARCH(1,1) model to S&P500 return series (y) using these three packages. Are the results identical? If not, What can account for the difference? Discuss.

# Answer
library(fGarch)
library(tseries)

spec1 = ugarchspec(
variance.model = list(garchOrder = c(1,1)),
mean.model = list(armaOrder = c(0,0), include.mean = FALSE))
ru.fit = ugarchfit(spec = spec1, data = y)
fg.fit = garchFit(y~garch(1,1),data=y,include.mean = FALSE)
ts.fit = garch(y)
rst=c()
rst=rbind(c(as.numeric(coef(ru.fit)),likelihood(ru.fit)),
c(as.numeric(coef(fg.fit)),as.numeric(-fg.fit@fit$llh)), c(as.numeric(ts.fit$coef),as.numeric(logLik(ts.fit))))
colnames(rst)=c("Omega", "Alpha", "Beta", "Log Likelihood")
rownames(rst)=c("rugarch", "fGarch", "tseries")
rst

# rugarch and fGarch give almost identical results, with very little difference. The log likelihood of the model estimated by tseries is lower. Possible explanations are:

# Different solvers. rugarch uses solnp by default, but it also provides several alternatives, such as nlminb, lbfgs, etc. fGarch uses nlminb by default, while tseries uses a Quasi-Newton optimizer.
# Variance initialization. The most common choice is the unconditional variance, but tseries documentation did not mention clearly the value it selects.
# Other factors, such initial values of parameters, number of iterations, etc.

# Previous studies show that the performance of tseries is inferior to the other two packages. For more discussion on this topic, see "On The Accuracy of GARCH Estimation in R Packages", by Hill and McCullough (2019).