Below we do some descriptive statistics on our data, mostly for dependence and tail shape. We opted for the SP-500. ## Libraries

``````library(moments)
suppressPackageStartupMessages(library(tseries))
suppressPackageStartupMessages(library(car))``````

## 9.1 Data

``````load('data/data.RData')
sp500=data\$sp500``````

## 9.2 Statistical analysis

### 9.2.1 Sample stats

``````mean(sp500\$y)
sd(sp500\$y)
skewness(sp500\$y)
kurtosis(sp500\$y)``````
0.000286197721858836
0.0121610363140039
-0.511112007341173
15.765198795919

We can present these results a bit nicer by putting them into a data frame.

``````cat('Sample statistics for the SP-500 from',min(sp500\$date),'to',max(sp500\$date),'\n')

stats=data.frame()
stats=c('mean:',paste0(round(mean(sp500\$y)*100,3),'%'))
stats=rbind(stats,c('sd:',paste0(round(sd(sp500\$y)*100,2),'%')))
stats=rbind(stats,c('skewness:',round(skewness(sp500\$y),1)))
stats=rbind(stats,c('kurtosis:',round(kurtosis(sp500\$y),1)))
stats``````
``Sample statistics for the SP-500 from 20030103 to 20221230 ``
stats mean: 0.029% sd: 1.22% skewness: -0.5 kurtosis: 15.8

Or printing them with formatting.

``````for(i in 1:dim(stats)[1])
cat(stats[i,1],rep("",10-nchar(stats[i,1])),stats[i,2],"\n")``````
``````mean:      0.029%
sd:        1.22%
skewness:  -0.5
kurtosis:  15.8 ``````

### 9.2.2 Jarque-Bera

The Jarque-Bera (JB) test in the `tseries` library uses the third and fourth central moments of the sample data to check if it has a skewness and kurtosis matching a normal distribution.

``jarque.bera.test(sp500\$y)``
``````
Jarque Bera Test

data:  sp500\$y
X-squared = 34398, df = 2, p-value < 2.2e-16``````

The p-value presented is the inverse of the JB statistic under the CDF of the asymptotic distribution. The p-value of this test is virtually zero, which means that we have enough evidence to reject the null hypothesis that our data has been drawn from a normal distribution.

### 9.2.3 Ljung-Box

The Ljung-Box (LB) test checks if the presented data exhibit serial correlation. We can implement the test using the `Box.test()` function and specifying `type = "Ljung-Box"`:

``````Box.test(sp500\$y, type = "Ljung-Box")
Box.test(sp500\$y^2, type = "Ljung-Box")``````
``````
Box-Ljung test

data:  sp500\$y
X-squared = 81.253, df = 1, p-value < 2.2e-16``````
``````
Box-Ljung test

data:  sp500\$y^2
X-squared = 551.08, df = 1, p-value < 2.2e-16``````

### 9.2.4 Autocorrelation

The autocorrelation function shows us the linear correlation between a value in our time series with its different lags. The `acf` function plots the autocorrelation of an ordered vector. The horizontal lines are confidence intervals, meaning that if a value is outside of the interval, it is considered significantly different from zero.

``````par(mar=c(4,4,1,0))
acf(sp500\$y, main = "Autocorrelation of returns")``````

The plots made by the acf function don’t not look all the nice. While there are other versions available for R, including one for `ggplot`, for now we will simply modify the builtin `acf()`. Call that `myACF()` and put it into `functions.r`.

``````myACF=function (x, n = 50, lwd = 2, col1 = co2[2], col2 = co2[1], main = NULL)
{
a = acf(x, n, plot = FALSE)
significance_level = qnorm((1 + 0.95)/2)/sqrt(sum(!is.na(x)))
barplot(a\$acf[2:n, , 1], las = 1, col = col1, border = FALSE,
las = 1, xlab = "lag", ylab = "ACF", main = main)
abline(significance_level, 0, col = col2, lwd = lwd)
abline(-significance_level, 0, col = col2, lwd = lwd)
axis(side = 1)
}
par(mar=c(4,4,1,0))
myACF(sp500\$y, main = "Autocorrelation of SP-500 returns")
myACF(sp500\$y^2, main = "Autocorrelation of squared SP-500 returns")``````

### 9.2.5 QQ-Plots

We can use the quantile-quantile (QQ) plot to see how our data fits particular distributions. To make the QQ plots, we will use the `qqPlot()` function from the `car` package:

Start with the normal plot, indicated by `distribution = "norm"` which is the default.

``````par(mar=c(4,4,1,0))
x=qqPlot(sp500\$y, distribution = "norm", envelope = FALSE)``````

We can also use QQ-Plots to see how our data fits tthe t distributions with various degrees of freedom.

``````par(mfrow=c(2,2), mar=c(4,4,1,0))
x=qqPlot(sp500\$y, distribution = "norm", envelope = FALSE,xlab="normal")
x=qqPlot(sp500\$y, distribution = "t", df = 4, envelope = FALSE,xlab="t(4)")
x=qqPlot(sp500\$y, distribution = "t", df = 3.5, envelope = FALSE,xlab="t(3.5)")
x=qqPlot(sp500\$y, distribution = "t", df = 3, envelope = FALSE,xlab="t(3)")``````