10  Simulations

Simulations are a very powerful technique widely used in risk forecasting. For example, we use simulations to evaluate the accuracy of various models, provide intuition on how models work, and provide analysis that otherwise would require complicated mathematics.

When referring to simulations, we often use the term, Monte Carlo, after the statelet on the southern coast of France famous for casinos. We will use the terms simulation, Monte Carlo, and MC interchangeably.

When doing simulations, we ask a computer algorithm to replicate the real world. It is important to recognise the limitations of such an exercise.

To begin with, we are asking a computer to do something random, but computers are deterministic machines againnd cannot create something that is fully random. Instead, they produce something called a pseudorandom number, and they do that using an algorithm called a pseudorandom number generator. That is clumsy, and going forward, we will simply use the term random number generator or RNG.

Second, the world is much more complicated than any simulated representation of it. It is, therefore, important to recognise that a simulation is only a very crude approximation of what actually happens. Even if a simulation exercise appears to validate or reject something, that might not actually be true.

A random number generator creates a sequence of numbers, and when it has gone through the sequence, it starts again from the beginning. Therefore, if we know the random number xi and know the random number generator, we also know the number xi+1 and indeed all the random numbers. In cryptography, if hackers can figure out an actual random number and knows the generator, they can maliciously decrypt all documents.

The number of different random numbers is known as the period. The default random number generator in R has a period of 2199371, which is more than enough for almost every possible application. And on the odd chance it’s not sufficient, there are random number generators that can even create more.

The term seed refers to a particular starting point in the sequence of random numbers, and the command for setting the seed is set.seed(). Therefore, if one sets the seed to 1, one always gets the same random numbers. However, if one picks two instead, that is not simply the second number in the sequence.

10.1 Libraries

10.2 Random numbers

10.2.1 Basics

In , we discussed the various distributions built into R and the four forms of distribution: functions. The first letter of the function name indicates which of these four types, d for distribution, p for probability, q for quantile and r for random. The remainder of the name is the distribution, for example, for the normal and student-t:

To obtain one simulated standard normal, we call it rnorm(n), where n is the number of different random numbers. Therefore get:

rnorm(n=1)
[1] 1.775232
rnorm(n=5)
[1]  1.1726694  0.8080757  0.3147070 -0.9437919 -0.6289969

Every time we call the function, we get a different random number:

rnorm(n=1)
[1] -0.2797756
rnorm(n=1)
[1] -2.000671

10.2.2 Seed

Generally, but not always, we want the random numbers to stay the same every time we run an algorithm. We are running some analysis, perhaps a risk calculation, and we want to get the same answer every time we run the program.

There are exceptions to this, and you may want the random numbers two chains every time you call the algorithm. For example, if you use R to make a decision. Suppose you can’t decide if you want to go to a movie. Then, you can run.

rnorm(1) 
[1] -0.9619334
rnorm(1)<0 
[1] TRUE
ifelse(rnorm(1)<0,"stay at home","go to a movie")
[1] "go to a movie"
ifelse(rnorm(1)<0,"stay at home","go to a movie")
[1] "stay at home"
ifelse(rnorm(1)<0,"stay at home","go to a movie")
[1] "go to a movie"

Since random number generators create a sequence of that repeats, if we fix a place in the sequence, we always get the same random numbers. The way to do that is to set the seed in R set.seed()

rnorm(3)
[1] 0.03012394 0.08541773 1.11661021
rnorm(3)
[1] -1.2188574  1.2673687 -0.7447816
set.seed(666)
rnorm(3)
[1]  0.7533110  2.0143547 -0.3551345
set.seed(666)
rnorm(3)
[1]  0.7533110  2.0143547 -0.3551345

10.3 Distributions

We will use several distributions, the normal (rnorm(n)), student-t (rt(n,df)), skewed student-t (rskt(n, df, gamma)), chi-square (rchisq(n,df)), the binomial (rbinom(n,size,prob)) and the Bernoulli, which is just rbinom(n, size = 1, prob = prob).

rnorm(n=1)
[1] 2.028168
rt(n=1,df=3)
[1] -1.968612
rskt(n=1, df=3, gamma=1)
[1] -1.680305
rchisq(n=1,df=2)
[1] 0.05901815
rbinom(n=1,size=1,prob=0.5)
[1] 0

We can also plot some random numbers.

plot(rnorm(n=100),pch=16,col='blue')

plot(rt(n=100,df=2),pch=16,col='blue')

plot(rchisq(n=100,df=3),pch=16,col='blue')

plot(rbinom(n=100,size=1,prob=0.5),pch=16,col='blue')

It might be better to see them all in the same plot.

x=cbind(
 rnorm(n=100),
 rt(n=100,df=3),
 rchisq(n=100,df=2),
 rbinom(n=100,size=1,prob=0.5)
)
head(x,5)
           [,1]       [,2]       [,3] [,4]
[1,]  1.2654884 -2.8133631 1.10121691    0
[2,]  0.9010432  0.5166151 2.57504365    0
[3,] -1.6617320  0.2861781 3.43316778    1
[4,] -1.0291693 -0.8284511 0.05988823    1
[5,]  0.1268667  6.5405712 0.93775470    1
matplot(x,
 type='p',
 pch=16,
 col=c("red","blue","green","black"),
 bty='l',
 xlab='',
 ylab='',
 las=1
)
legend("topright",
 legend=c("norm","t","chi2","bern"),
 col=c("red","blue","green","black"),
 bty='n',
 pch=16,
 horiz=TRUE
)

10.4 Number of simulations

A key problem in simulations is to determine the number of necessary simulations. Pick too few, and one gets an inaccurate simulation estimate, and one wastes resources. The former problem is much more common. In a very special case, one over the number of simulations gives an estimate of the simulation accuracy, but that is only correct in a linear model. In that case, simulation is superfluous.

Below, we pick six simulation sizes, c(5,25,100,1e3,1e4,1e5), and see how well their mean and standard deviation compare to the known true value.

S=c(5,25,100,1e3,1e4,1e5)
for(n in S ){
 x=rnorm(n=n,mean=1,sd=2)
 if(n==S[1]){
 df=t(as.matrix(c(n,mean(x),sd(x))))
 }else{
 df=rbind(df,c(n,mean(x),sd(x)))
 }
}
df=as.data.frame(df)
names(df)=c("n","mean","sd")
df$meanError=df$mean-1
df$sdError=df$sd-2

#rownames(df)=NULL
print(df)
       n      mean       sd    meanError       sdError
1      5 3.4254490 2.210857  2.425448982  0.2108567841
2     25 0.4149429 2.144837 -0.585057133  0.1448373110
3    100 1.0025227 1.876900  0.002522730 -0.1230996715
4   1000 1.0818521 1.979812  0.081852123 -0.0201875335
5  10000 1.0222385 1.999544  0.022238455 -0.0004559622
6 100000 1.0057566 1.995098  0.005756615 -0.0049017984

10.5 Random walk

It is easy to simulate a random walk. In the example below, we set the mean to 0 and the standard deviation to 1%, which is approximately correct for most stock returns. We also use the IID normal distribution, which is not correct for stock returns.

N=1000
x=rnorm(N,mean=0,sd=0.01)
p=cumprod(x+1)
p=p/p[1]
plot(p,type='l')

x=rnorm(N,mean=0,sd=0.01)
p=cumprod(x+1)
p=p/p[1]
plot(p,type='l')

We can also do several random walks and plot them on the same figure.

N=1000
R=4
for(i in 1:R){
 p=cumprod(rnorm(N,mean=1,sd=0.01))
 p=p/p[1] 
 if(i==1){
 mat=p
 }else{
 mat=cbind(mat,p)
 }
}
matplot(mat,type='l',lty=1)

10.6 Pricing options

It is very common to use simulations to price options. Below, we use it to price a standard Black-Sholes option. Of course, we know the true option price in that case, but that allows us to compare the accuracy of the simulation to the true value.

Start with a function for the Black-Sholes.

bs = function(X, P, r, sigma, T){ 
 d1 = (log(P/X) + (r + 0.5*sigma^2)*(T))/(sigma*sqrt(T))
 d2 = d1 - sigma*sqrt(T)
 Call = P*pnorm(d1, mean = 0, sd = 1) - X*exp(-r*(T))*pnorm(d2, mean = 0, sd = 1)
 Put = X*exp(-r*(T))*pnorm(-d2, mean = 0, sd = 1) - P*pnorm(-d1, mean = 0, sd = 1)
 Delta.Call = pnorm(d1, mean = 0, sd = 1) 
 Delta.Put = Delta.Call - 1
 Gamma = dnorm(d1, mean = 0, sd = 1)/(P*sigma*sqrt(T))
 return(list(Call=Call,Put=Put,Delta.Call=Delta.Call,Delta.Put=Delta.Put,Gamma=Gamma))
}

And set some parameters for the option.

P0=50
sigma=0.2
r=0.05
T=0.5
X=40
bsprice=bs(X,P0,r,sigma,T)
print(bsprice)
$Call
[1] 11.08728

$Put
[1] 0.09967718

$Delta.Call
[1] 0.9660259

$Delta.Put
[1] -0.03397407

$Gamma
[1] 0.01066378

We then create a function sim() that takes as arguments for the number of simulations, S and the seed.

sim=function(S=1e6,seed=666){
 set.seed(seed)
 F=P0*exp(r*T)
 ysim=rnorm(S,-0.5*sigma*sigma*T, sigma*sqrt(T))
 F=F*exp(ysim)
 SP=F-X
 SP[SP<0]=0
 fsim=SP*exp(-r*T)
 return(fsim)
}

Print ten simulations and plot a 100.

print(sim(S=10))
 [1] 16.054970 26.805763  8.065303 26.934463  0.000000 16.094585  2.140755
 [8]  5.178977  0.000000 10.196711
plot(sim(S=100))

If we let the number of simulations vary over c(1e1,5e1,1e2,1e3,1e4), we can get a measurement of the accuracy of the number of simulations as their number increases:

cat("bs",bsprice$Call,"\n")
bs 11.08728 
 for(S in c(1e1,5e1,1e2,1e3,1e4)){
 cat(S,mean(sim(S=S)),"\n")
 }
10 11.14715 
50 10.75441 
100 10.65253 
1000 10.93522 
10000 11.19661 

You could also use simulations to get the confidence bounds for the simulation estimates. We leave that as an exercise.

Below, we do a lot more simulations and plot their histogram with the superimposed true value.

fsim=sim()
hist(fsim, 
 nclass=100, 
 probability=TRUE, 
 xlim=c(0,35), 
 ylim=c(0,0.12),
 col='red', 
 las=1, 
 main="", 
 xlab="Option prices", 
 ylab="Density"
)
segments(bsprice$Call,0,bsprice$Call,0.1, col='red')
text(bsprice$Call, 0.11, "Black-Scholes\nprice")

10.7 Multivariate

10.8 Exercise

Use simulations to get the confidence bounds for the simulation estimates.