rnorm(n=1)
[1] 1.775232
rnorm(n=5)
[1] 1.1726694 0.8080757 0.3147070 -0.9437919 -0.6289969
Simulations are a very powerful technique widely used in risk forecasting. For example, we use simulations to evaluate the accuracy of various models, provide intuition on how models work, and provide analysis that otherwise would require complicated mathematics.
When referring to simulations, we often use the term, Monte Carlo, after the statelet on the southern coast of France famous for casinos. We will use the terms simulation, Monte Carlo, and MC interchangeably.
When doing simulations, we ask a computer algorithm to replicate the real world. It is important to recognise the limitations of such an exercise.
To begin with, we are asking a computer to do something random, but computers are deterministic machines againnd cannot create something that is fully random. Instead, they produce something called a pseudorandom number, and they do that using an algorithm called a pseudorandom number generator. That is clumsy, and going forward, we will simply use the term random number generator or RNG.
Second, the world is much more complicated than any simulated representation of it. It is, therefore, important to recognise that a simulation is only a very crude approximation of what actually happens. Even if a simulation exercise appears to validate or reject something, that might not actually be true.
A random number generator creates a sequence of numbers, and when it has gone through the sequence, it starts again from the beginning. Therefore, if we know the random number \(x_i\) and know the random number generator, we also know the number \(x_{i+1}\) and indeed all the random numbers. In cryptography, if hackers can figure out an actual random number and knows the generator, they can maliciously decrypt all documents.
The number of different random numbers is known as the period. The default random number generator in R has a period of \(2^{19937}−1\), which is more than enough for almost every possible application. And on the odd chance it’s not sufficient, there are random number generators that can even create more.
The term seed refers to a particular starting point in the sequence of random numbers, and the command for setting the seed is set.seed()
. Therefore, if one sets the seed to 1, one always gets the same random numbers. However, if one picks two instead, that is not simply the second number in the sequence.
In Section 9.4, we discussed the various distributions built into R and the four forms of distribution: functions. The first letter of the function name indicates which of these four types, d
for distribution, p
for probability, q
for quantile and r
for random. The remainder of the name is the distribution, for example, for the normal and student-t:
To obtain one simulated standard normal, we call it rnorm(n)
, where n
is the number of different random numbers. Therefore get:
Every time we call the function, we get a different random number:
Generally, but not always, we want the random numbers to stay the same every time we run an algorithm. We are running some analysis, perhaps a risk calculation, and we want to get the same answer every time we run the program.
There are exceptions to this, and you may want the random numbers two chains every time you call the algorithm. For example, if you use R to make a decision. Suppose you can’t decide if you want to go to a movie. Then, you can run.
[1] -0.9619334
[1] TRUE
[1] "go to a movie"
[1] "stay at home"
[1] "go to a movie"
Since random number generators create a sequence of that repeats, if we fix a place in the sequence, we always get the same random numbers. The way to do that is to set the seed in R set.seed()
We will use several distributions, the normal (rnorm(n)
), student-t (rt(n,df)
), skewed student-t (rskt(n, df, gamma)
), chi-square (rchisq(n,df)
), the binomial (rbinom(n,size,prob)
) and the Bernoulli, which is just rbinom(n, size = 1, prob = prob)
.
[1] 2.028168
[1] -1.968612
[1] -1.680305
[1] 0.05901815
[1] 0
We can also plot some random numbers.
It might be better to see them all in the same plot.
x=cbind(
rnorm(n=100),
rt(n=100,df=3),
rchisq(n=100,df=2),
rbinom(n=100,size=1,prob=0.5)
)
head(x,5)
[,1] [,2] [,3] [,4]
[1,] 1.2654884 -2.8133631 1.10121691 0
[2,] 0.9010432 0.5166151 2.57504365 0
[3,] -1.6617320 0.2861781 3.43316778 1
[4,] -1.0291693 -0.8284511 0.05988823 1
[5,] 0.1268667 6.5405712 0.93775470 1
matplot(x,
type='p',
pch=16,
col=c("red","blue","green","black"),
bty='l',
xlab='',
ylab='',
las=1
)
legend("topright",
legend=c("norm","t","chi2","bern"),
col=c("red","blue","green","black"),
bty='n',
pch=16,
horiz=TRUE
)
A key problem in simulations is to determine the number of necessary simulations. Pick too few, and one gets an inaccurate simulation estimate, and one wastes resources. The former problem is much more common. In a very special case, one over the number of simulations gives an estimate of the simulation accuracy, but that is only correct in a linear model. In that case, simulation is superfluous.
Below, we pick six simulation sizes, c(5,25,100,1e3,1e4,1e5)
, and see how well their mean and standard deviation compare to the known true value.
S=c(5,25,100,1e3,1e4,1e5)
for(n in S ){
x=rnorm(n=n,mean=1,sd=2)
if(n==S[1]){
df=t(as.matrix(c(n,mean(x),sd(x))))
}else{
df=rbind(df,c(n,mean(x),sd(x)))
}
}
df=as.data.frame(df)
names(df)=c("n","mean","sd")
df$meanError=df$mean-1
df$sdError=df$sd-2
#rownames(df)=NULL
print(df)
n mean sd meanError sdError
1 5 3.4254490 2.210857 2.425448982 0.2108567841
2 25 0.4149429 2.144837 -0.585057133 0.1448373110
3 100 1.0025227 1.876900 0.002522730 -0.1230996715
4 1000 1.0818521 1.979812 0.081852123 -0.0201875335
5 10000 1.0222385 1.999544 0.022238455 -0.0004559622
6 100000 1.0057566 1.995098 0.005756615 -0.0049017984
It is easy to simulate a random walk. In the example below, we set the mean to 0 and the standard deviation to 1%, which is approximately correct for most stock returns. We also use the IID normal distribution, which is not correct for stock returns.
We can also do several random walks and plot them on the same figure.
It is very common to use simulations to price options. Below, we use it to price a standard Black-Sholes option. Of course, we know the true option price in that case, but that allows us to compare the accuracy of the simulation to the true value.
Start with a function for the Black-Sholes.
bs = function(X, P, r, sigma, T){
d1 = (log(P/X) + (r + 0.5*sigma^2)*(T))/(sigma*sqrt(T))
d2 = d1 - sigma*sqrt(T)
Call = P*pnorm(d1, mean = 0, sd = 1) - X*exp(-r*(T))*pnorm(d2, mean = 0, sd = 1)
Put = X*exp(-r*(T))*pnorm(-d2, mean = 0, sd = 1) - P*pnorm(-d1, mean = 0, sd = 1)
Delta.Call = pnorm(d1, mean = 0, sd = 1)
Delta.Put = Delta.Call - 1
Gamma = dnorm(d1, mean = 0, sd = 1)/(P*sigma*sqrt(T))
return(list(Call=Call,Put=Put,Delta.Call=Delta.Call,Delta.Put=Delta.Put,Gamma=Gamma))
}
And set some parameters for the option.
$Call
[1] 11.08728
$Put
[1] 0.09967718
$Delta.Call
[1] 0.9660259
$Delta.Put
[1] -0.03397407
$Gamma
[1] 0.01066378
We then create a function sim()
that takes as arguments for the number of simulations, S
and the seed.
Print ten simulations and plot a 100.
[1] 16.054970 26.805763 8.065303 26.934463 0.000000 16.094585 2.140755
[8] 5.178977 0.000000 10.196711
If we let the number of simulations vary over c(1e1,5e1,1e2,1e3,1e4)
, we can get a measurement of the accuracy of the number of simulations as their number increases:
bs 11.08728
10 11.14715
50 10.75441
100 10.65253
1000 10.93522
10000 11.19661
You could also use simulations to get the confidence bounds for the simulation estimates. We leave that as an exercise.
Below, we do a lot more simulations and plot their histogram with the superimposed true value.
Use simulations to get the confidence bounds for the simulation estimates.