library(moments)
suppressPackageStartupMessages(library(tseries))
suppressPackageStartupMessages(library(car))
9 Descriptive statistics
Below we do some descriptive statistics on our data, mostly for dependence and tail shape. We opted for the SP-500. ## Libraries
9.1 Data
load('data/data.RData')
=data$sp500 sp500
9.2 Statistical analysis
9.2.1 Sample stats
mean(sp500$y)
sd(sp500$y)
skewness(sp500$y)
kurtosis(sp500$y)
We can present these results a bit nicer by putting them into a data frame.
cat('Sample statistics for the SP-500 from',min(sp500$date),'to',max(sp500$date),'\n')
=data.frame()
stats=c('mean:',paste0(round(mean(sp500$y)*100,3),'%'))
stats=rbind(stats,c('sd:',paste0(round(sd(sp500$y)*100,2),'%')))
stats=rbind(stats,c('skewness:',round(skewness(sp500$y),1)))
stats=rbind(stats,c('kurtosis:',round(kurtosis(sp500$y),1)))
stats stats
Sample statistics for the SP-500 from 20030103 to 20221230
stats | mean: | 0.029% |
---|---|---|
sd: | 1.22% | |
skewness: | -0.5 | |
kurtosis: | 15.8 |
Or printing them with formatting.
for(i in 1:dim(stats)[1])
cat(stats[i,1],rep("",10-nchar(stats[i,1])),stats[i,2],"\n")
mean: 0.029%
sd: 1.22%
skewness: -0.5
kurtosis: 15.8
9.2.2 Jarque-Bera
The Jarque-Bera (JB) test in the tseries
library uses the third and fourth central moments of the sample data to check if it has a skewness and kurtosis matching a normal distribution.
jarque.bera.test(sp500$y)
Jarque Bera Test
data: sp500$y
X-squared = 34398, df = 2, p-value < 2.2e-16
The p-value presented is the inverse of the JB statistic under the CDF of the asymptotic distribution. The p-value of this test is virtually zero, which means that we have enough evidence to reject the null hypothesis that our data has been drawn from a normal distribution.
9.2.3 Ljung-Box
The Ljung-Box (LB) test checks if the presented data exhibit serial correlation. We can implement the test using the Box.test()
function and specifying type = "Ljung-Box"
:
Box.test(sp500$y, type = "Ljung-Box")
Box.test(sp500$y^2, type = "Ljung-Box")
Box-Ljung test
data: sp500$y
X-squared = 81.253, df = 1, p-value < 2.2e-16
Box-Ljung test
data: sp500$y^2
X-squared = 551.08, df = 1, p-value < 2.2e-16
9.2.4 Autocorrelation
The autocorrelation function shows us the linear correlation between a value in our time series with its different lags. The acf
function plots the autocorrelation of an ordered vector. The horizontal lines are confidence intervals, meaning that if a value is outside of the interval, it is considered significantly different from zero.
par(mar=c(4,4,1,0))
acf(sp500$y, main = "Autocorrelation of returns")
The plots made by the acf function don’t not look all the nice. While there are other versions available for R, including one for ggplot
, for now we will simply modify the builtin acf()
. Call that myACF()
and put it into functions.r
.
=function (x, n = 50, lwd = 2, col1 = co2[2], col2 = co2[1], main = NULL)
myACF
{= acf(x, n, plot = FALSE)
a = qnorm((1 + 0.95)/2)/sqrt(sum(!is.na(x)))
significance_level barplot(a$acf[2:n, , 1], las = 1, col = col1, border = FALSE,
las = 1, xlab = "lag", ylab = "ACF", main = main)
abline(significance_level, 0, col = col2, lwd = lwd)
abline(-significance_level, 0, col = col2, lwd = lwd)
axis(side = 1)
}par(mar=c(4,4,1,0))
myACF(sp500$y, main = "Autocorrelation of SP-500 returns")
myACF(sp500$y^2, main = "Autocorrelation of squared SP-500 returns")
9.2.5 QQ-Plots
We can use the quantile-quantile (QQ) plot to see how our data fits particular distributions. To make the QQ plots, we will use the qqPlot()
function from the car
package:
Start with the normal plot, indicated by distribution = "norm"
which is the default.
par(mar=c(4,4,1,0))
=qqPlot(sp500$y, distribution = "norm", envelope = FALSE) x
We can also use QQ-Plots to see how our data fits tthe t distributions with various degrees of freedom.
par(mfrow=c(2,2), mar=c(4,4,1,0))
=qqPlot(sp500$y, distribution = "norm", envelope = FALSE,xlab="normal")
x=qqPlot(sp500$y, distribution = "t", df = 4, envelope = FALSE,xlab="t(4)")
x=qqPlot(sp500$y, distribution = "t", df = 3.5, envelope = FALSE,xlab="t(3.5)")
x=qqPlot(sp500$y, distribution = "t", df = 3, envelope = FALSE,xlab="t(3)") x