# 10Descriptive statistics

Below we do some descriptive statistics on our data, mostly for dependence and tail shape. We opted for the SP-500.

## 10.1 Libraries

``````library(moments)
suppressPackageStartupMessages(library(tseries))
suppressPackageStartupMessages(library(car))``````

## 10.2 Data

``````load('data/data.RData')
sp500=data\$sp500``````

## 10.3 Statistical analysis

### 10.3.1 Sample stats

``````mean(sp500\$y)
sd(sp500\$y)
skewness(sp500\$y)
kurtosis(sp500\$y)``````
0.000286197721858836
0.0121610363140039
-0.511112007341173
15.765198795919

We can present these results a bit nicer by putting them into a data frame.

``````cat('Sample statistics for the SP-500 from',min(sp500\$date),'to',max(sp500\$date),'\n')

stats=data.frame()
stats=c('mean:',paste0(round(mean(sp500\$y)*100,3),'%'))
stats=rbind(stats,c('sd:',paste0(round(sd(sp500\$y)*100,2),'%')))
stats=rbind(stats,c('skewness:',round(skewness(sp500\$y),1)))
stats=rbind(stats,c('kurtosis:',round(kurtosis(sp500\$y),1)))
stats``````
``Sample statistics for the SP-500 from 20030103 to 20221230 ``
 stats mean: 0.029% sd: 1.22% skewness: -0.5 kurtosis: 15.8

Or printing them with formatting.

``````for(i in 1:dim(stats))
cat(stats[i,1],rep("",10-nchar(stats[i,1])),stats[i,2],"\n")``````
``````mean:      0.029%
sd:        1.22%
skewness:  -0.5
kurtosis:  15.8 ``````

### 10.3.2 Jarque-Bera

The Jarque-Bera (JB) test in the `tseries` library uses the third and fourth central moments of the sample data to check if it has a skewness and kurtosis matching a normal distribution.

``jarque.bera.test(sp500\$y)``
``````
Jarque Bera Test

data:  sp500\$y
X-squared = 34398, df = 2, p-value < 2.2e-16``````

The p-value presented is the inverse of the JB statistic under the CDF of the asymptotic distribution. The p-value of this test is virtually zero, which means that we have enough evidence to reject the null hypothesis that our data has been drawn from a normal distribution.

### 10.3.3 Ljung-Box

The Ljung-Box (LB) test checks if the presented data exhibit serial correlation. We can implement the test using the `Box.test()` function and specifying `type = "Ljung-Box"`:

``````Box.test(sp500\$y, type = "Ljung-Box")
Box.test(sp500\$y^2, type = "Ljung-Box")``````
``````
Box-Ljung test

data:  sp500\$y
X-squared = 81.253, df = 1, p-value < 2.2e-16``````
``````
Box-Ljung test

data:  sp500\$y^2
X-squared = 551.08, df = 1, p-value < 2.2e-16``````

### 10.3.4 Autocorrelation

The autocorrelation function shows us the linear correlation between a value in our time series with its different lags. The `acf` function plots the autocorrelation of an ordered vector. The horizontal lines are confidence intervals, meaning that if a value is outside of the interval, it is considered significantly different from zero.

``````par(mar=c(4,4,3,0))
acf(sp500\$y, main = "Autocorrelation of returns")`````` The plots made by the acf function don’t not look all the nice. While there are other versions available for R, including one for `ggplot`, for now we will simply modify the builtin `acf()`. Call that `myACF()` and put it into `functions.r`.

``````myACF=function (x, n = 50, lwd = 2, col1 = co2, col2 = co2, main = NULL)
{
a = acf(x, n, plot = FALSE)
significance_level = qnorm((1 + 0.95)/2)/sqrt(sum(!is.na(x)))
barplot(a\$acf[2:n, , 1], las = 1, col = col1, border = FALSE,
las = 1, xlab = "lag", ylab = "ACF", main = main)
abline(significance_level, 0, col = col2, lwd = lwd)
abline(-significance_level, 0, col = col2, lwd = lwd)
axis(side = 1)
}
par(mar=c(4,4,1,0))
myACF(sp500\$y, main = "Autocorrelation of SP-500 returns")
myACF(sp500\$y^2, main = "Autocorrelation of squared SP-500 returns")``````  ### 10.3.5 QQ-Plots

We can use the quantile-quantile (QQ) plot to see how our data fits particular distributions. To make the QQ plots, we will use the `qqPlot()` function from the `car` package:

Start with the normal plot, indicated by `distribution = "norm"` which is the default.

``````par(mar=c(4,4,1,0))
x=qqPlot(sp500\$y, distribution = "norm", envelope = FALSE)`````` We can also use QQ-Plots to see how our data fits tthe t distributions with various degrees of freedom.

``````par(mfrow=c(2,2), mar=c(4,4,1,0))
x=qqPlot(sp500\$y, distribution = "norm", envelope = FALSE,xlab="normal")
x=qqPlot(sp500\$y, distribution = "t", df = 4, envelope = FALSE,xlab="t(4)")
x=qqPlot(sp500\$y, distribution = "t", df = 3.5, envelope = FALSE,xlab="t(3.5)")
x=qqPlot(sp500\$y, distribution = "t", df = 3, envelope = FALSE,xlab="t(3)")`````` ## 10.4 Exercise

Using `data.Rdata`, compute descriptive statistics for all stocks in the `Return` data frame. Summarize and present the results in a new data frame. It should include mean, standard deviation, skewness, kurtosis, and the p-value of Jarque-Bera test. Keep five decimal places.

``````# Answer
rst=c()
for (i in 2:ncol(data\$Return)){
tmp=data\$Return[,i]
jbtest=jarque.bera.test(tmp)
temp.rst=c(mean(tmp), sd(tmp), skewness(tmp), kurtosis(tmp), jbtest\$p.value)
rst=rbind(rst, temp.rst)
}
rst=round(rst, digits=5)
colnames(rst)=c("Mean", "Std Dev", "Skewness", "Kurtosis", "JB Test")
rownames(rst)=colnames(data\$Return)[-1]
rst``````
A matrix: 6 × 5 of type dbl
Mean Std Dev Skewness Kurtosis JB Test
AAPL 0.00126 0.02122 -0.10078 8.33528 0
DIS 0.00037 0.01756 0.21869 12.56880 0
GE -0.00006 0.02043 -0.07300 12.06136 0
INTC 0.00019 0.01977 -0.44631 11.96508 0
JPM 0.00044 0.02285 0.26633 20.63585 0
MCD 0.00066 0.01333 0.11484 19.03846 0