10  Descriptive statistics

Below we do some descriptive statistics on our data, mostly for dependence and tail shape. We opted for the SP-500.

10.1 Libraries


10.2 Data


10.3 Statistical analysis

10.3.1 Sample stats


We can present these results a bit nicer by putting them into a data frame.

cat('Sample statistics for the SP-500 from',min(sp500$date),'to',max(sp500$date),'\n')

Sample statistics for the SP-500 from 20030103 to 20221230 
A matrix: 4 × 2 of type chr
stats mean: 0.029%
sd: 1.22%
skewness: -0.5
kurtosis: 15.8

Or printing them with formatting.

for(i in 1:dim(stats)[1])
mean:      0.029% 
sd:        1.22% 
skewness:  -0.5 
kurtosis:  15.8 

10.3.2 Jarque-Bera

The Jarque-Bera (JB) test in the tseries library uses the third and fourth central moments of the sample data to check if it has a skewness and kurtosis matching a normal distribution.


    Jarque Bera Test

data:  sp500$y
X-squared = 34398, df = 2, p-value < 2.2e-16

The p-value presented is the inverse of the JB statistic under the CDF of the asymptotic distribution. The p-value of this test is virtually zero, which means that we have enough evidence to reject the null hypothesis that our data has been drawn from a normal distribution.

10.3.3 Ljung-Box

The Ljung-Box (LB) test checks if the presented data exhibit serial correlation. We can implement the test using the Box.test() function and specifying type = "Ljung-Box":

Box.test(sp500$y, type = "Ljung-Box")
Box.test(sp500$y^2, type = "Ljung-Box")

    Box-Ljung test

data:  sp500$y
X-squared = 81.253, df = 1, p-value < 2.2e-16

    Box-Ljung test

data:  sp500$y^2
X-squared = 551.08, df = 1, p-value < 2.2e-16

10.3.4 Autocorrelation

The autocorrelation function shows us the linear correlation between a value in our time series with its different lags. The acf function plots the autocorrelation of an ordered vector. The horizontal lines are confidence intervals, meaning that if a value is outside of the interval, it is considered significantly different from zero.

acf(sp500$y, main = "Autocorrelation of returns")

The plots made by the acf function don’t not look all the nice. While there are other versions available for R, including one for ggplot, for now we will simply modify the builtin acf(). Call that myACF() and put it into functions.r.

myACF=function (x, n = 50, lwd = 2, col1 = co2[2], col2 = co2[1], main = NULL) 
    a = acf(x, n, plot = FALSE)
    significance_level = qnorm((1 + 0.95)/2)/sqrt(sum(!is.na(x)))
    barplot(a$acf[2:n, , 1], las = 1, col = col1, border = FALSE, 
        las = 1, xlab = "lag", ylab = "ACF", main = main)
    abline(significance_level, 0, col = col2, lwd = lwd)
    abline(-significance_level, 0, col = col2, lwd = lwd)
    axis(side = 1)
myACF(sp500$y, main = "Autocorrelation of SP-500 returns")
myACF(sp500$y^2, main = "Autocorrelation of squared SP-500 returns")

10.3.5 QQ-Plots

We can use the quantile-quantile (QQ) plot to see how our data fits particular distributions. To make the QQ plots, we will use the qqPlot() function from the car package:

Start with the normal plot, indicated by distribution = "norm" which is the default.

x=qqPlot(sp500$y, distribution = "norm", envelope = FALSE)

We can also use QQ-Plots to see how our data fits tthe t distributions with various degrees of freedom.

par(mfrow=c(2,2), mar=c(4,4,1,0))
x=qqPlot(sp500$y, distribution = "norm", envelope = FALSE,xlab="normal")
x=qqPlot(sp500$y, distribution = "t", df = 4, envelope = FALSE,xlab="t(4)")
x=qqPlot(sp500$y, distribution = "t", df = 3.5, envelope = FALSE,xlab="t(3.5)")
x=qqPlot(sp500$y, distribution = "t", df = 3, envelope = FALSE,xlab="t(3)")

10.4 Exercise

Using data.Rdata, compute descriptive statistics for all stocks in the Return data frame. Summarize and present the results in a new data frame. It should include mean, standard deviation, skewness, kurtosis, and the p-value of Jarque-Bera test. Keep five decimal places.

# Answer
for (i in 2:ncol(data$Return)){
  temp.rst=c(mean(tmp), sd(tmp), skewness(tmp), kurtosis(tmp), jbtest$p.value)
  rst=rbind(rst, temp.rst)
rst=round(rst, digits=5)
colnames(rst)=c("Mean", "Std Dev", "Skewness", "Kurtosis", "JB Test")
A matrix: 6 × 5 of type dbl
Mean Std Dev Skewness Kurtosis JB Test
AAPL 0.00126 0.02122 -0.10078 8.33528 0
DIS 0.00037 0.01756 0.21869 12.56880 0
GE -0.00006 0.02043 -0.07300 12.06136 0
INTC 0.00019 0.01977 -0.44631 11.96508 0
JPM 0.00044 0.02285 0.26633 20.63585 0
MCD 0.00066 0.01333 0.11484 19.03846 0