18 Backtesting

When forecasting risk, it is important to evaluate the quality of the forecasts. Compared to many other applications, such as price or weather forecasting, we cannot directly compare the forecast to an eventual realised outcome because the risk is a latent variable. It cannot be directly measured; it can only be inferred, and that means we need indirect ways to evaluate forecasting. That said, there are some ways to evaluate the quality of various risk forecasts, and that is what one does with backtesting.

18.1 Libraries

library(rugarch) 
library(zoo)
library(lubridate)
library(microbenchmark)
library(parallel) 
library(doParallel)
library(foreach)
source("common/functions.r",chdir=TRUE)

18.2 Data

data=ProcessRawData() 
sp500=data$sp500

18.3 Intermediate files

We argued in Section Section 12.2 that we should normally avoid intermediate files unless absolutely necessary. Backtesting might provide the exception. The reason is that the time to estimate some of the models we use is not trivial. When we have to repeat the estimation every day in the testing window, the estimation time can become excessive. Since we then have to process the backtesting results, it may be more efficient to run them in a separate file, save them, and then code up the processing separately.

That said, it might be sensible to do that when developing the code that reports on the backtest, but it would usually be best not to create intermediate files for the final run.

18.4 Setup

Specify the VaR probability, portfolio value, estimation window size and portfolio weight:

par        = list()
par$p      = 0.05
par$value  = 1000
par$WE     = 2000
par$T      = length(sp500$y)
par$WT     = par$T-par$WE
par$lambda = 0.9

18.5 Backtesting functions

We divided the code to run the backtesting into two functions, OneDayBacktesting() and simply Backtesting(). While not strictly necessary, this separation of functionality is helpful in our case because we want to evaluate the time it takes to do the backtesting.

18.5.1 `RunOneDay()`

Note the line

x=do.call(
  what=paste0("Risk.",Methods[i]),
  args=
)

in function RunOneDay(). Here, paste0("Risk.",Methods[i]) just takes an estimation method and combines it with the string Risk. For example, if Methods[i] is HS, we get Risk.HS, which is one of the risk functions we use.

do.call(what) then calls the function in the what argument, in this case Risk.HS() with arguments y=sp500$y[t1:t2],par=par,Print=Print.

Note that the vector res holds both the VaR and ES results. The reason we do it that way is to facilitate the multi-core implementation below.

RunOneDay=function(t,y=y,Methods,Print=FALSE)  {
  t1=t-1
  t2=t1-par$WE+1
  if(Print) cat(t,t2,t1,"\n")
  res=rep(0,2*length(Methods))
  for(i in 1:length(Methods)){
    x=do.call(what=paste0("Risk.",Methods[i]),args=list(y=y[t1:t2],par=par,Print=Print ))
    res[i]=x$VaR
    res[i+length(Methods)]=x$ES
  }
  return(res)
}

18.5.2 `Backtesting()`

We run RunOneDay() in the Backtesting() function, and it takes four arguments: the returns y, control parameters par, the estimation Methods and diagnostic Print.

We start by creating a matrix, res, to hold the output and then loop over the estimation window, putting the results into the corresponding row in res.

We then split the res matrix into the VaR and ES matrices and gave the column names.

We define a profit and loss vector, PL, and use that to create matrixes of violations, VaR.V and ES.V.

Backtesting=function(y,par,Methods,Print=FALSE){
  res=matrix(ncol=length(Methods)*2,nrow=par$T)
  for(t in (par$WE+1):par$T){
    res[t,]=RunOneDay(t=t,y=y,Methods=Methods,Print=Print) 
  }
  VaR=res[,1:length(Methods)]
  ES=res[,(length(Methods)+1):(2*length(Methods))]

  colnames(ES)=Methods
  colnames(VaR)=Methods
  PL=y*par$value

  VaR.V=-(PL +VaR)
  ES.V=-(PL +ES)
  
  VaR.V[VaR.V<0]=0
  VaR.V[VaR.V>0]=1

  ES.V[ES.V<0]=0
  ES.V[ES.V>0]=1

  x=return(list(ES=ES,VaR=VaR,VaR.V=VaR.V,ES.V=ES.V,PL=PL))
}

18.6 Run backtest

We only run the backtest here for HS and EWMA since these are quite quick to run, while the other methods take considerably longer. We run them all below.

res=Backtesting(y=sp500$y,
  par=par,
  Methods=c("HS","EWMA")
)
matplot(sp500$date.ts,cbind(res$PL,-res$VaR),type='l',lty=1)

What you see from the plot is that even if the returns start from day one, we only get backtests starting from day $W_{E} + 1$ or 2001. Therefore, we should start the analysis of the backtests from that day.

print((par$WE -2):(par$WE +2))

[1] 1998 1999 2000 2001 2002

print(res$VaR[(par$WE -2):(par$WE +2),])

           HS     EWMA
[1,]       NA       NA
[2,]       NA       NA
[3,]       NA       NA
[4,] 19.73436 22.93636
[5,] 19.73436 20.88139

One way to do that is to create new variables that start from the day par$WE+1, or 2001

VaR=res$VaR[(par$WE +1):par$T,]
ES=res$ES[(par$WE +1):par$T,]
VaR.V=res$VaR.V[(par$WE +1):par$T,]
ES.V=res$ES.V[(par$WE +1):par$T,]
PL=res$PL[(par$WE +1):par$T]
date.ts=sp500$date.ts[(par$WE +1):par$T]

Then

matplot(date.ts,cbind(PL,-VaR),type='l',lty=1)

And we can then analyse the number of violations.

cat("Expected number of violations",par$p*par$WT,"\n")

Expected number of violations 151.7

print(colSums(VaR.V))

  HS EWMA 
 144  236

18.7 How long does it take to run?

Backtesting is slow, except for the fastest methods. There are several ways to measure the time it takes to run an R comment. The easiest is to take note of Sys.time(), which measures the actual time it is run. We can then run some code and compare the changes in Sys.time(). This is sometimes known as wall time because it measures the time a wall clock would take. The benefit is that this is simple.

The downside is that many factors can affect how long the code takes. The computer might be doing something else at the same time, which affects the results. Often, if we call the same function repeatedly, the first call takes the longest time, as we see here.

A more accurate method is to repeat the run several times and use the minimum or medium value. The easiest way to do that is with the R command microbenchmark().

18.7.1 Wall time

  y=sp500$y
  time1=Sys.time() 
  res=RunOneDay(par$T,y=y,Methods=c("HS")) 
  print(Sys.time() -time1)

Time difference of 0.0006859303 secs

  time1=Sys.time() 
  res=RunOneDay(par$T,y=y,Methods=c("EWMA")) 
  print(Sys.time() -time1)

Time difference of 0.0008900166 secs

  time1=Sys.time() 
  res=RunOneDay(par$T,y=y,Methods=c("GARCH")) 
  print(Sys.time() -time1)

Time difference of 0.06194687 secs

  time1=Sys.time() 
  res=RunOneDay(par$T,y=y,Methods=c("tGARCH")) 
  print(Sys.time() -time1)

Time difference of 0.2529449 secs

18.7.2 With `microbenchmark`

Start by running the microbenchmark for a GARCH model. In this case, it reports the results in milliseconds or one-thousandth of a second.

m=microbenchmark(
  RunOneDay(par$T,y=y,Methods=c("GARCH"))
)
m$expr="GARCH"
print(m)

Unit: milliseconds
  expr      min       lq     mean   median       uq      max neval
 GARCH 34.89965 35.55042 42.15387 36.03652 38.93754 200.4906   100

If we want to compare several methods, it is best to collect the output from microbenchmark in a vector. Note that the raw output is measured in nanoseconds or one billionth of a second.

  Methods=c("HS","EWMA","GARCH","tGARCH")
  res=vector(length=length(Methods))
  for(i in 1:length(Methods)){
    x=microbenchmark(
      RunOneDay(par$T,y=y,Methods=Methods[i])
    )
    res[i]=median(x$time)
  }
  print(res)

[1]    61643.5   372464.5 42252632.0 95180598.0

We can then plot them together, in this case, with a bar plot. Since the numbers are very different, a log scale helps.

x=res/1e6
a=barplot(round(x,4),
  log='y',
  las=1,
  border=FALSE,
  ylab="Milliseconds", 
  names.arg=Methods,
  col="lightblue"
)
text(a,x*1.14,x,xpd=TRUE)

While 0.088 seconds for tGARCH does not seem that long when repeated 3034 times over the testing window, it ends up taking 4.4 minutes.

18.8 Do it faster

It can be very annoying when a backtest takes a long time. There are several ways we can speed it up, and one of the most obvious is to take advantage of the fact that the processor in your computer has multiple cores. The central processor (computing) in your computer likely has multiple cores. By default, R uses one core. But most computers have at least 4, and some laptops have 16. You can run your job on all cores, and therefore, if your laptop has ten cores, your job will be ten times faster.

There are two problems with running jobs on multiple cores.

The first is that it is inevitably more complex. You need to write your code in a way that makes it easy for it to run on multiple cores and then call a special function actually to do that.

The second problem is that running multiple courses has some overhead, so small jobs can take longer to run on multiple cores than a single one.

18.8.1 Apply family

R comes with a family of functions, usually called the Apply family because they are all variants of a function called apply. There are many pages on the Internet describing the apply family, including this one. The idea is that the apply function takes every value of a vector or a matrix and applies to a function. The output from the apply function is usually a list of the output from the function call. Since we ultimately need to collapse that list into a data frame, it is easiest if the output from the function is a vector and not a list. This is why we coded RunOneDay() to return a vector.

18.8.2 Windows, Linux and Macs

We have an issue that arises because of differences between Windows computers and UNIX, which includes Macs and Linux. The way R does multi-core processing is more efficient on UNIX computers. The Windows way also runs on Macs.

We will use mclapply on Mac and Linux and foreach() on Windows. Both work on Mac and Linux.

mclapply() needs library(parallel), while foreach() needs library(foreach) and library(doParallel).

18.8.3 Number of cores

We need to identify the number of cores in the computer we are using. You might know the value, but different computers have different numbers, so it makes sense to identify the number automatically.

cores=detectCores()
print(cores)

[1] 12

The computer we are now using has 12 cores.

18.8.4 Sequence

When using the functions to do parallel computing, we do not guarantee that we get the answers back in the order of the input vector. The reason is that it optimises rather than splitting the jobs up. For many applications, such as in Monte Carlo simulations, that is not important. But here, it is important since we have to match a vector of risk forecasts to a day. We therefore have to take special care below to ensure that.

18.8.5 Mac/Linux — `mclapply()`

The mclapply() function takes the input vector we will apply to (par$WE+1):par$T, the function we want to run, RunOneDay and the arguments that function needs, y=y,Methods=Methods. If we don’t do anything else, mclapply() simply becomes lapply(); that is, by losing the mc, it no longer does multi-core, so the job simply runs on one core with lapply().

Methods=c("HS","EWMA","GARCH","tGARCH")

time1=proc.time()
res1=lapply(
  (par$WE+1):par$T,
  RunOneDay,
  y=y,
  Methods=Methods
) 
time1=proc.time()-time1
print(time1)

   user  system elapsed 
408.516  22.907 431.617

print(res1[1:2])

[[1]]
[1] 19.73436 22.93636 21.85700 21.42719 33.80597 28.76312 27.40956 28.99466

[[2]]
[1] 19.73436 20.88139 22.78245 22.37896 33.80597 26.18611 28.57012 30.28393

The output goes into a list res1, where each element represents the risk forecast on a particular day. We print the first numbers just to see if we always get the same answers when trying different versions of multi-core calculations.

By setting mc.cores=cores, we tell mclapply() to use cores cores, which, in our case, is 12.

mc.preschedule=TRUE ensures we get the estimates in the right order.

We use proc.time() to measure execution time.

time2=proc.time() 
res2=mclapply(
  (par$WE+1):par$T,
  RunOneDay,
  y=y,
  Methods=Methods,
  mc.cores=cores,
  mc.preschedule=TRUE
) 
time2=proc.time()-time2
print(time2)

   user  system elapsed 
522.167  21.531  59.526

print(res2[1:2])

[[1]]
[1] 19.73436 22.93636 21.85700 21.42719 33.80597 28.76312 27.40956 28.99466

[[2]]
[1] 19.73436 20.88139 22.78245 22.37896 33.80597 26.18611 28.57012 30.28393

We see the multicore is not quite 12 times faster. There is considerable overhead.

18.8.6 Windows (and Mac/Linux) — `mclapply()`

mclapply only runs on UNIX computers, and we need something different for Windows. Unfortunately, that is quite a bit more complicated. We both have to create a cluster and then give each core in the cluster a copy of all the data we need to run the jobs.

The variable cl is the cluster we make:

cl <- makeCluster(cores) 
registerDoParallel(cl)

We then have to give each core a copy of all packages and functions with clusterEvalQ()

x=clusterEvalQ(cl, {
  library(reshape2)
  library(rugarch)
  library(lubridate)
  library(zoo)
  source("common/functions.r",chdir=TRUE)
})

And use clusterExport() to give it our particular data

clusterExport(cl,c("par", "y","Methods"))

we run the code on the cluster with

res3=foreach(i = (par$WE+1):par$T) %dopar% {
  RunOneDay(i,y=y,Methods=Methods)
}

And need to destroy the cluster when all is done

stopCluster(cl)

time3=proc.time()
cl <- makeCluster(cores[1]-1) 
registerDoParallel(cl) 

x=clusterEvalQ(cl, {
  library(reshape2) 
  suppressPackageStartupMessages(library(rugarch))
  suppressPackageStartupMessages(library(lubridate))
  suppressPackageStartupMessages(library(zoo))
  source("common/functions.r",chdir=TRUE)
})

clusterExport(cl,c("par", "y","Methods"))
r=(par$WE+1):par$T

res3=foreach(i = (par$WE+1):par$T) %dopar% {
  RunOneDay(i,y=y,Methods=Methods)
}
stopCluster(cl)
time3=proc.time()-time3
print(time3)

   user  system elapsed 
  0.686   0.170  59.773

print(res3[1:2])

[[1]]
[1] 19.73436 22.93636 21.85700 21.42719 33.80597 28.76312 27.40956 28.99466

[[2]]
[1] 19.73436 20.88139 22.78245 22.37896 33.80597 26.18611 28.57012 30.28393

18.8.7 Compare

df=data.frame(cbind(
  c("lapply","mclapply","foreach"),
  c(time1["elapsed"],time2["elapsed"],time3["elapsed"])
  )
)
rownames(df)=NULL
names(df)=c("Approach","time")
print(df)

  Approach    time
1   lapply 431.617
2 mclapply  59.526
3  foreach  59.773

Are the answers identical?

identical(res1,res2)

[1] TRUE

identical(res2,res3)

[1] TRUE