FM442: Quantitative Methods for Finance and Risk Analysis

Jon Danielsson
Michaelmas Term 2020


Loading packages

We first load the packages for this seminar:

In [39]:
library(tseries)
library(car)
library(lubridate)

Statistical distributions

We need to be comfortable working with Probability Distributions in order to build financial risk forecasting models. In particular, we are going to work with the Normal Distribution, the Student-T, and the Chi-Square distribution. The names of these distributions in R, respectively, are: norm, t, and chisq.

Every distribution in R has four functions:

  • p for "probability, or the cumulative distribution function (cdf)
  • q for "quantile", the inverse of the cdf
  • d for density, the probability density function (pdf)
  • r for "random", a random variable drawn from the specified distribution

To apply any of these, you just need to add them before the distribution's name.

For example, we can use pnorm to calculate the cdf: $$F(x) = P(X≤x)$$ Where $X$ is a normally distributed random variable.

In [40]:
# CDF of 0 under a Standard Normal
pnorm(0)
0.5
In [41]:
# CDF of 8 under a Normal with mean 10 and standard deviation 2
pnorm(8, mean = 10, sd = 2)

# Rounding it
round(pnorm(8, mean = 10, sd = 2),3)
0.158655253931457
0.159
In [42]:
# Sequence from -2 to 2, with increments of 0.5
sequence <- seq(-2, 2, 0.5)
round(pnorm(sequence),3)
  1. 0.023
  2. 0.067
  3. 0.159
  4. 0.309
  5. 0.5
  6. 0.691
  7. 0.841
  8. 0.933
  9. 0.977

For many applications, we will be interested in looking for the inverse of the cdf. For example, if we want to find the 95-th quantile of a Student-t with 3 degrees of freedom:

In [43]:
# Inverse CDF of 0.95 under a Student-t with 3 degrees of freedom
qt(0.95, df = 3)
2.35336343480182

Plotting a distribution

We can read probability-quantile combinations from the CDF plot of a distribution. Let's find the 5% quantile of a Standard Normal:

In [44]:
# Creating a sequence
x <- seq(-3,3,0.1)

# Vector with the CDF
cdf <- pnorm(x)

# Plotting it
plot(x, cdf, type = "l", main = "CDF of a Standard Normal",
    lwd = 3, las = 1, col = "blue")

# Find 5% quantile using qnorm
q5 <- qnorm(0.05)

# Add lines using the segments() function
segments(x0 = q5, y0 = -1, y1 = 0.05, lty = "dashed", lwd = 2, col ="red")
segments(x0 = -4, y0 = 0.05, x1 = q5, lty = "dashed", lwd = 2, col ="red")

# Add tick marks in the plot
axis(1, at = q5, label = round(q5,2))
axis(2, at = 0.05, label = 0.05, las = 1)

Comparing the Normal Distribution with the Student-t

The Student-t distribution has fatter tails than the Normal, which can be convenient for some applications. The lower the degrees of freedom of a Student-t, the fatter the tails. By definition, a Student-t will converge to a Normal distribution as the degrees of freedom go to infinity.

In [45]:
# Start by creating a sequence
x <- seq(-3, 3, 0.1)

# Normal density
normal <- dnorm(x)

# Student-t with 2 df
st2 <- dt(x, df = 2)

# Student-t with 3 df
st3 <- dt(x, df = 3)

# Student-t with 10 df
st10 <- dt(x, df = 10)

plot(x, normal, type = "l", main = "Comparing distributions", col = 1, xlab = "x", ylab = "f(x)")
lines(x, st2, col = 2)
lines(x, st3, col = 3)
lines(x, st10, col = 4)

legend("topright", legend = c("Normal", "T - 2 df", "T - 3 df", "T - 10 df"), col = c(1:4), fill = c(1:4))

Random numbers and Monte Carlo

One of the advantages of working with statistical software is being able to do simulations using random (or better said, pseudorandom) numbers. We will go into more details on random numbers generation in Chapter 7.

We can get random numbers drawn from a specific distribution by using the prefix r before the distribution name:

In [46]:
# One random number from a Standard Normal
rnorm(1)

# Ten random numbers from a Normal(10, 0.5)
rnorm(10, mean = 10, sd = 0.5)

# One random number from a Student-t with 5 degrees of freedom
rt(1, df = 5)
0.937012408830822
  1. 9.27746310496962
  2. 10.031635897445
  3. 10.1746320467716
  4. 10.7681963935134
  5. 9.74040713059832
  6. 9.92440986730903
  7. 8.75117935351729
  8. 9.76837828386729
  9. 10.8255383105505
  10. 9.58157505138012
-0.31485745875495

Everytime we run a function that outputs random numbers, we will get a different value. Usually, to allow for replication, we set a seed:

In [47]:
set.seed(442)
rnorm(5)
  1. -0.894891360378309
  2. -2.6921679536767
  3. 0.764099073417597
  4. -0.0172356169957856
  5. -1.62406009659155
In [48]:
set.seed(442)
rnorm(5)
  1. -0.894891360378309
  2. -2.6921679536767
  3. 0.764099073417597
  4. -0.0172356169957856
  5. -1.62406009659155

Comparing distributions using random numbers

To show the Fat Tails of the Student-t compared to the Normal distribution, we will draw 1000 points from each distribution:

In [49]:
# Normal distribution 
rnd_normal <- rnorm(1000)

# Student-t
rnd_t <- rt(1000, df = 3)

# Plotting
plot(rnd_t, col = "red", pch = 16, main = "Random points from a Normal and Student-t")
points(rnd_normal, col = "blue",pch = 16)

# Adding a legend
legend("bottomright", legend = c("Student-t", "Normal"), pch = 16, col = c("red", "blue"))

We see that the points from a Student-t distribution take on more extreme values compared to the Normal. This is a consequence of fat tails.

Assigning your grades with a random number generator

Let's say I want to assign your grades for FM442 using a random number generator. Grades go from 0 to 100 and have to be integers. Drawing numbers from a distribution like the Normal or the Uniform will give us non-integers, so we can either:

  • Round the number to the nearest integer
  • Use a different function, like sample(), which only outputs integers

We will use the latter. sample(x,size) takes a vector x, and a size, and returns a vector of the given size of random draws from x:

In [50]:
sample(1:100, 1)
sample(1:100, 3)
sample(1:100, 5)
44
  1. 53
  2. 10
  3. 37
  1. 9
  2. 31
  3. 20
  4. 18
  5. 86

Let's say there are 60 students in the class, we can get the grades:

In [51]:
# Getting the grades
grades <- sample(1:100, 60)

# Creating a histogram
hist(grades, col = "lightgray")

How many people got a Distinction:

In [52]:
# We can use a condition in the square brackets to subset the vector:
length(grades[grades >= 70])
18

Small and Large sample properties

Let's explore the distribution of random samples of different sizes, drawn from a standard normal distribution, compared to the distribution:

In [53]:
# Sample size: 100
norm1 <- rnorm(100)

# Sample size: 1000
norm2 <- rnorm(1000)

# Sample size: 100000
norm3 <- rnorm(100000)

# Create a sequence
x <- seq(-3,3,0.1)

# Plot the histograms and overlay a normal distribution
hist(norm1, freq = FALSE, breaks = 20, main = "Sample size 100", col = "lightgrey", ylim = c(0, 0.5))
lines(x,dnorm(x), lwd = 3, col = "red")
hist(norm2, freq = FALSE, breaks = 20, main = "Sample size 1000", col = "lightgrey", ylim = c(0, 0.5))
lines(x,dnorm(x), lwd = 3, col = "red")
hist(norm3, freq = FALSE, breaks = 20, main = "Sample size 100000", col = "lightgrey", ylim = c(0, 0.5))
lines(x,dnorm(x), lwd = 3, col = "red")

Monte Carlo

A Monte Carlo method is a class of computational algorithm that uses repeated random sampling to obtain numerical results. It is commonly used in quantitative finance to simulate uncertainty and analyze instruments, portfolios or investments.

We are going to do a simple exercise to simulate the profit and loss distribution from holding a stock.

Assume we have a stock that has normally distributed returns with mean $0$ and variance $\sigma^{2}$, and a risk-free interest rate $r$. We can then simulate $S$ one-day returns:

$$ y_{t+1,i} \sim \mathcal{N}\left(0,\sigma^{2}\right), i = 1, ..., S $$

And with each simulated return, calculate the one-day future price following the formula. The last term is the lognormal correction. We will discuss this in detail in chapter 7:

$$P_{t+1,i} = P_{t}e^{r(1/365)} \times e^{y_{t+1,i}} \times e^{-0.5\sigma^{2}}$$

Consider a stock $X$ with price $100$, and daily volatility $0.01$. The annual risk free rate is $5\%$. We will find the distribution of profits and loss for tomorrow's price.

In [54]:
# Initial price
X_0 <- 100

# Volatility
sigma <- 0.01

# Interest rate
r <- 0.05

# Initialize a vector that will contain all realizations
X_1 <- numeric()

# Perform one million realizations
for (i in 1:10^6) {
    X_1[i] <- X_0*exp(r/365)*exp(rnorm(1,0,sigma)-sigma^2/2)
}

# Plot a histogram
hist(X_1, col = "lightgray", main = "Distribution of expected prices",
    xlab = "Price")

We don't know what tomorrow's price will be, but based on the assumptions we made, this experiment gives us the possible values.

We can order the values and find the quantile of this distribution using the quantile() function.

Let's find what is the 5% quantile of our expected price:

In [55]:
quantile(X_1, 0.05)
5%: 98.3791034901985

Adding the 5% quantile as a vertical line in the plot:

In [56]:
hist(X_1, col = "lightgray", main = "Distribution of expected prices",
    xlab = "Price")

abline(v = quantile(X_1, 0.05), lwd = 2, col = "red")

Random Walk

A random walk is a stochastic process that describe the path of a variable that moves randomly through time. For example, consider the process:

$$Y_{t} = Y_{t-1} + \epsilon_{t}$$

Where $Y_{0}$ is given and $\epsilon \sim \mathcal{N}\left(0,\sigma^2\right)$.

We can consider a stock, $Y$, that at time $0$ has price of 500, and follows a random walk with $\sigma^2 = 2$ . If we want to see how the price evolves over time, we can use R to create a large number of simulations. Let's see one thousand different simulations on how the price of the stock would move in one hundred days:

In [4]:
# Create a matrix to hold the paths. One column for every simulation and one row for each day, starting from 0
mtx <- matrix(NA, nrow = 101, ncol = 1000)

# Since at time 0 the price is 500, we fill the first row with this value
mtx[1,] <- 500

# Use a loop to draw the path of the price for each simulation

# i represents the simulation, from 1 to 1000
for (i in 1:1000) { 
    
    # j represents the row in our matrix, one for each day
    for (j in 2:101) {
        
        # We proceed to fill the matrix following the random walk
        mtx[j,i] <- mtx[j-1,i] + rnorm(1, mean = 0, sd = 2.5)
    }
}

We have run 1000 simulations, each starts with a price of 100 and follows a random walk for 100 time periods. Let's see the head of the matrix:

In [5]:
head(mtx)
500.0000500.0000500.0000500.0000500.0000500.0000500.0000500.0000500.0000500.0000500.0000500.0000500.0000500.0000500.0000500.0000500.0000500.0000500.0000500.0000
497.4730500.6141500.7086501.2396498.2074502.2359499.5943504.6308500.8661499.2514497.0631501.6005497.0751502.6585501.0505495.3562502.5410500.0095501.8358500.9092
500.4940503.9456501.2562502.5058505.0726502.3084501.0884501.9273503.6714497.7203500.8220500.3946496.3692502.4713505.9253498.3714504.2104500.5267504.8215500.3047
496.3450503.3618502.1960501.8873506.1756501.2923498.6686496.9498498.9511493.3669500.2172500.5884493.6703495.2946503.8849498.8223504.3627497.8238508.2044497.8299
500.1308505.9729504.0924499.4888505.7583504.5040498.2979498.3585495.1515496.6513505.4030498.5791490.4436495.3772502.2266500.9999503.4992503.4886507.2814494.1244
496.6878507.5609505.2794508.4140509.6336500.8331497.8810498.5668495.4327500.3786508.8021496.9854492.2669495.2150495.7652488.3693501.9535501.7482505.9730493.2095

We can plot it using matplot() to see the random walk visually:

In [6]:
matplot(mtx, main = "Random Walk", xlab = "Time", ylab = "Price", type = "l", lty = 1)