11  When things go wrong in univariate volatility estimation

The estimation in the previous chapter went through without any problems, but we are not always so lucky. Since we are maximising a non-linear function, there are many things that can go wrong, often with hard to diagnose problems, as the code can fail with an incomprehensible error message. There can be many plausible explanations. They might be errors in data, the algorithms might be buggy, or you code might have mistakes.

These problems can arise because of the trade-off between safe code and speed. If we want to ensure that nothing goes wrong, we need a lot of checks and robust algorithms, which is is slow, even very slow. Consequently, we opt for algorithms that are fast, usually work but then sometimes fail.

11.1 Libraries

We use two libraries, the rugarch and nlopt which is a library for nonlinear optimisation.

source('functions.r')
suppressPackageStartupMessages(library(rugarch))
library(nloptr)
source('common.r')

11.1.1 Define rugarch specs

spec.g = ugarchspec(
    variance.model = list(garchOrder = c(1,1)),
    mean.model = list(armaOrder = c(0,0), include.mean = FALSE)
)
spec.t = ugarchspec(
    variance.model = list(garchOrder = c(1,1)),
    mean.model = list(armaOrder = c(0,0), include.mean = FALSE),
    distribution.model = "std"
)

11.2 Data

The data we use is returns on the S&P500 index, and for convenience, put them into variable y.

load('data/data.RData')
y=data$sp500$y
y=y-mean(y)

11.3 Failure

When I tried to estimate the tGARCH model over a particular 2000 day estimation window, y[720:2719] the estimation failed. Other estimation windows also failed. This happens on the computer I’m using now, and the estimation might be successful on a different computer, especially if it has a different version of R or other key libraries.

Here is what we do:

Res = ugarchfit(spec = spec.t, data = y[720:2719])


When we run this we get the following error message:

Error in robustvcv(fun = f, pars = ipars[estidx, 1], nlag = nlag, hess = fit$hessian, : object 'B' not found
Traceback:

1. ugarchfit(spec = spec.t, data = data)
2. ugarchfit(spec = spec.t, data = data)
3. .sgarchfit(spec = spec, data = data, out.sample = out.sample, 
 .     solver = solver, solver.control = solver.control, fit.control = fit.control, 
 .     numderiv.control = default.numd)
4. .makefitmodel(garchmodel = "sGARCH", f = .sgarchLLH, T = T, m = m, 
 .     timer = timer, convergence = convergence, message = sol$message, 
 .     hess = hess, arglist = arglist, numderiv.control = numderiv.control)
5. robustvcv(fun = f, pars = ipars[estidx, 1], nlag = nlag, hess = fit$hessian, 
 .     n = T, arglist = arglist)

11.3.1 Why does it fail?

This error message might seem incomprehensible, but there are some clues in it. It fails in something called robustvcv, not something we need for our purpose, which is simply one day ahead volatility forecast, but it is required for getting the confidence bounds of the parameters. In particular, vcv relates to variance-covariance-matrix, and robust to its calculation in a way that is not overly sensitive to distributional assumptions.

While I can only guess, I think the error has to do with some overflow of numerical values. Some returns are very small, while others are quite large, and when we take them to the power two, the differences amplify. The covariance matrix is obtained from the inverse hessian, which is the matrix of second derivatives at the top of the likelihood function. It is tricky to estimate this hessian, especially if we want it done quickly.

It comes down to limited numerical precision. Most calculations are done with what is known as double precision floating point, called float64, a number that uses 64 bits in computer memory. A floating point number looks like \(0.0234=2.34\times 10^{-2}\) written in code as 2.34e-2.

The smallest number is \(2^{-1022}\) and largest \(2^{1023}\). There are two parts to a floating point number, the exponent (or power), and mantissa the precision bits of the number, in front of the Power. 52 bits represent the mantissa.

This means that if we do calculations with numbers are very different from each other, one can end up with an answer that is very incorrect. For example, if we want to sum up many numbers that are very different in size, you will get a different answer whether we start adding up the smallest numbers or the largest numbers. Also, as we see in Chapter 14 discussing backtests, we can get a different answer depending on if we do a log of a product or the sum of logs.

This is what I think happened here, some sort of numerical precision problem. There are some ways to solve it. Some easy, others a bit more complicated.

11.3.1.1 solver = "hybrid" does not help

We could try

Res = ugarchfit(spec = spec.t, data = data,solver = "hybrid")

but that also fails, for obvious reasons

11.4 Re-scaling might work

The first thing to try is rescaling the data.

11.4.1 De-meaning

When we imported the data, we removed the mean, but only from all the observations. If, however, we de-mean the estimation window, the calculation succeeds.

data=y[720:2719]
data=data-mean(data)
Res = ugarchfit(spec = spec.t, data = data,)
coef(Res)
omega
1.360510234611e-06
alpha1
0.101372511919534
beta1
0.897091485255765
shape
5.42919447734144

11.4.2 Normalising variance

Normalising the variance to be one also works, but then we need to re-scale \(\omega\) back since \[ \sigma^2 = \frac{\omega}{1-\alpha-\beta} \] and so if we do data=data/sd(data) then need to do \[ \hat{\omega}\times var(y) \]

data=y[720:2719]
scale=1/sd(data)
data=data*scale
Res = ugarchfit(spec = spec.t, data = data)
coef(Res)
omega=coef(Res)[1] / scale^2 
omega
omega
0.0069388748226148
alpha1
0.102217110814989
beta1
0.896542051617316
shape
5.36617128835846
omega: 1.36634439986126e-06

11.4.3 Do both

Usually, it is best to do both, both de-mean and normalise

11.4.4 Why did re-scaling work?

The reason why the de-mean and normalisation works is because it makes most of the numbers be around one which means that when we square them, the difference between the big and the small numbers doesn’t become very large, so we don’t get the numerical overflow we discussed above.

11.5 Detecting failures — try()

Suppose the rescaling doesn’t work, and the estimation still fails. If doing one estimation, it is easy to spot the problem and do something else. But what if doing backtesting, and we want to generate a large number of one day risk forecasts. Then it becomes annoying to have the code fail.

Here a R statement called try() can be particularly useful. try() is designed to catch errors, so you try a calculation, and either you get an answer back, or a message saying the estimation failed.

Suppose you want to take a logarithm of x, log(x). the x needs to be positive number, and if you pass a string, the code crashes

log("string")
ERROR: Error in log("string"): non-numeric argument to mathematical function

But what if you do a try.

res1=try(log(1))
res2=try(log("string"),silent = TRUE)

Both ran without error, but the second, of course failes. We can see that by

res1
class(res1)
0
'numeric'
res2
[1] "Error in log(\"string\") : non-numeric argument to mathematical function\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in log("string"): non-numeric argument to mathematical function>
class(res2)
'try-error'

and detect it by

class(res2)=="try-error"
TRUE

So after running the try we check the output, and if its class is "try-error" we know the calculation failed and we should deal with that.

11.6 Optimisers

Recall the discussion in Chapter 10 on how we maximise the likelihood function. The algorithm used to do that maximisation is called an optimiser.

The basic R distribution provides a function called optim() for optimising functions. It provides several algorithms, including the very old and robust Nelder-Mead That might not be the fastest algorithm, but does have the advantage of not depending on derivatives, and it will always find the next optimum. Nelder-Mead has at least two disadvantages, it requires more steps to find the optimum than some other algorithms, and it only does local optimisation, meaning it just finds the next peak.

There is an alternative library called NLopt that provides a large number of state-of-the-art optimisers. NLopt can be used with almost every programming language, and the R implementation is nloptr.

nloptr requires more steps than the built in optim.

11.6.1 Optimisers minimise

One thing to keep in mind is that by default, optimisers minimise functions instead of maximising as we require here. This means that when we return a functional value, we have to invert the sign before returning.

11.6.2 Ensuring positive parameters

In our case, the parameters that go into the likelihood function must be positive. There are at least two different ways we can ensure that positivity. The easiest is simply to take the absolute value of the parameters, but we can often also impose constraints when we call the optimising function. We show both below.

11.6.3 Optimisaiton in R

Suppose we want to minimise some function f(parameters), where we have three parameters.

11.6.3.1 Build-in optim

The easiest way is to use the built-in optim function, which by default uses the Nelder-Mead algorithm. It returns a list that contains the parameter estimates, the function value at the optimum and some other diagnostics.

res = optim(c(0.000001, 0.1, 0.85) ,fn= f) # Optimization

11.6.3.2 NLopt

NLopt is a bit more complicated. We first have to set up control parameters for the estimation, like one of:

opts =list("algorithm"="NLOPT_GN_CRS2_LM", "xtol_rel"=1.0e-8, maxeval=1e5)
opts =list("algorithm"="NLOPT_GN_DIRECT_L", "xtol_rel"=1.0e-8, maxeval=1e5)
opts =list("algorithm"="NLOPT_LN_NELDERMEAD", "xtol_rel"=1.0e-8, maxeval=1e5)

Where we see three different algorithm choices. We also set the tolerance xtol_rel and how often the function can be called, maxeval.

After that we call the actual optimisation with nloptr. We set the starting values x0, the function name eval_f, the lower and upper bounds on the parameters, lb and ub, and the control list.

res = nloptr(x0=c(0.000001, 0.1, 0.85), 
               eval_f=f, 
               lb=c(0,0,0), 
               ub=c(1,1,1),
               opts=opts)

It returns a list that contains the parameter estimates, the function value at the optimum and some other diagnostics.

11.7 Coding the likelihood

11.7.1 Issues

  • Value of \(\sigma_1^2\): We need to assign a value to \(\sigma_1^2\), the first iteration of the conditional volatility. General, we choose the unconditional variance of the data for this;
  • Initial values of the parameters: Optimisation works iteratively. You need to assign values to initialise the parameters, making sure these are compliant with the restrictions of the function and the parameter bounds. This will have a small effect on the parameter values, but can have a large effect in computing time.

We also need to pick an optimisation method and algorithm.

11.7.2 Normal GARCH

The likelihood function and its derivation can be found in Chapter 2 of the book and slides

\[ \begin{split} \log\mathcal{L} & = \underbrace{{-\dfrac{T-1}{2}\log(2\pi)}}_{\text{Constant }}\\ & - \dfrac{1}{2}\sum\limits_{t=2}^{T}\left({(\log (\omega+\alpha{y^2_{t-1}+\beta{{\hat\sigma^2_{t-1}})})}} + \dfrac{y^2_t}{\omega+\alpha{y^2_{t-1}}+\beta{{{\hat\sigma^2}_{t-1}}}}\right) \end{split} \]

11.7.3 tGARCH

The Student-t density is given by \[ \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi}\Gamma\left(\frac{\nu}{2}\right)} \left(1+\frac{x^2}{\nu} \right)^{-\frac{\nu+1}{2}} \]

We want to make two alterations

  1. Make it standardised;
  2. Write it in terms of a time changing volatility model,

Which gives us

\[ \frac{1}{\sigma_t} \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{(\nu-2)\pi\Gamma\left(\frac{\nu}{2}\right)}} \left(1+\frac{y_t^2}{\sigma_t^2}\frac{1}{\nu-2} \right)^{-\frac{\nu+1}{2}} \]

and take logs. There are two parts, the constant and time changing part.

The constant is

\[ \log\Gamma\left(\frac{\nu+1}{2}\right)- \frac{1}{2}\log(\nu\pi)-\log\Gamma\left(\frac{\nu}{2}\right) \] while the time changing part is

\[ -\log\sigma_t -\frac{\nu+1}{2}\left( \log\left(\sigma_t^2(\nu-2)+y_t^2\right) -\log\left(\sigma_t^2(\nu-2) \right) \right) \]

11.8 Implementation

11.8.1 Issues

There are three issues that arise.

The first is that we have to ensure the parameters are positive, which we do below by par=abs(par) but also by setting lb and ub in nloptr. We don’t need both, but since we are comparing two different ways of doing optimisation keep it in here.

Second, we don’t want to repeat calculations. We might end up calling the likelihood function large number of times, so we don’t want to be squaring the returns or counting the elements or taking the variance more often than necessary. So we pre-calculate everything we can, It is especially important not to do any uneeded calculations inside the loops.

Finally, we need to get the last volatility. While we can write clever code for that, in this case we are just doing the simple way of putting it into global memory with <<- which is not how one is supposed to do things, but can be very convenient.

11.8.2 Normal GARCH

y2=y^2
sigma2=var(y)
T = length(y2)   


LL.GARCH = function(par) { 
    par=abs(par)
    omega = par[1] 
    alpha = par[2] 
    beta = par[3]  
    loglikelihood = 0
    for(i in 2:T){
        sigma2 = omega + alpha*y2[i-1] + beta*sigma2   
        loglikelihood=loglikelihood+log(sigma2)+y2[i]/sigma2
    }
    loglikelihood = -0.5* loglikelihood -(T-1)/2 * log(2*pi)
    sigma2Last <<- sigma2
    return(-loglikelihood) 
}
res = optim(c(0.000001, 0.1, 0.85) ,fn= LL.GARCH) 
res$par=abs(res$par)

opts =list("algorithm"="NLOPT_LN_NELDERMEAD", "xtol_rel"=1.0e-8, maxeval=1e4)

res = nloptr(x0=c(0.000001, 0.1, 0.85), 
               eval_f=LL.GARCH, 
               lb=c(0,0,0), 
               ub=c(1,1,1),
               opts=opts)

spec.2 = ugarchspec(
    variance.model = list(garchOrder = c(1,1)),
    mean.model = list(armaOrder = c(0,0), include.mean = FALSE)
)
Res = ugarchfit(spec = spec.2, data = y)


cat("Omega:", res$solution[1], coef(Res)[1],"\n",
   "Alpha:", res$solution[2], coef(Res)[2], "\n",
   "Beta:", res$solution[3], coef(Res)[3],"\n")
Omega: 2.459155e-06 2.459018e-06 
 Alpha: 0.1238671 0.1238915 
 Beta: 0.8557986 0.8557904 

11.8.3 tGARCH

y2=y^2
sigma2=var(y)
T = length(y2)   

LL.tGARCH = function(par) { 
    par=abs(par)
    omega = par[1] 
    alpha = par[2] 
    beta = par[3]  
    nu = par[4]  
    n1=nu+1.0;
    n2=nu-2.0;
    loglikelihood = 0

    for(i in 2:T){
        sigma2 = omega + alpha*y2[i-1] + beta*sigma2    
        loglikelihood=loglikelihood+
            log(sigma2)+ 
            n1*(log(n2*sigma2+y2[i])-log(n2*sigma2) )
    }
    loglikelihood = -loglikelihood/2.0;    
    loglikelihood=loglikelihood+
        (T-1)*(log(gamma(n1/2))-
        log(gamma(nu/2))-0.5*log(pi*n2))                                  
    sigma2Last<<- sigma2
    return(-loglikelihood) 
}
res = optim(c(0.000001, 0.1, 0.85,6) ,fn= LL.tGARCH) # Optimization
res$par=abs(res$par)



opts =list("algorithm"="NLOPT_GN_CRS2_LM", "xtol_rel"=1.0e-6, maxeval=1e6)
opts =list("algorithm"="NLOPT_LN_NELDERMEAD", "xtol_rel"=1.0e-8, maxeval=1e5)

resn = nloptr(x0=c(0.000001, 0.1, 0.85,6), 
               eval_f=LL.tGARCH, 
               lb=c(0,0.02,0.6,2), 
               ub=c(1e-4,0.2,1,1e3),
               opts=opts)


cat("sigma2",sigma2Last,"\n")
 
spec.3 = ugarchspec(
    variance.model = list(garchOrder = c(1,1)),
    mean.model = list(armaOrder = c(0,0), include.mean = FALSE),
    distribution.model = "std"
)
Res = ugarchfit(spec = spec.3, data = y)

cat("Omega:", resn$solution[1], coef(Res)[1],"\n",
   "Alpha:", resn$solution[2], coef(Res)[2], "\n",
   "Beta:", resn$solution[3], coef(Res)[3],"\n",
   "nu:", resn$solution[4], coef(Res)[4],"\n")
sigma2 0.0001619243 
Omega: 1.519132e-06 1.505701e-06 
 Alpha: 0.1242821 0.1240716 
 Beta: 0.8703039 0.8706782 
 nu: 6.067725 6.061799