library(moments)
suppressPackageStartupMessages(library(tseries))
suppressPackageStartupMessages(library(car))
10 Descriptive statistics
Below we do some descriptive statistics on our data, mostly for dependence and tail shape. We opted for the SP-500.
10.1 Libraries
10.2 Data
load('data/data.RData')
=data$sp500 sp500
10.3 Statistical analysis
10.3.1 Sample stats
mean(sp500$y)
sd(sp500$y)
skewness(sp500$y)
kurtosis(sp500$y)
We can present these results a bit nicer by putting them into a data frame.
cat('Sample statistics for the SP-500 from',min(sp500$date),'to',max(sp500$date),'\n')
=data.frame()
stats=c('mean:',paste0(round(mean(sp500$y)*100,3),'%'))
stats=rbind(stats,c('sd:',paste0(round(sd(sp500$y)*100,2),'%')))
stats=rbind(stats,c('skewness:',round(skewness(sp500$y),1)))
stats=rbind(stats,c('kurtosis:',round(kurtosis(sp500$y),1)))
stats stats
Sample statistics for the SP-500 from 20030103 to 20221230
stats | mean: | 0.029% |
sd: | 1.22% | |
skewness: | -0.5 | |
kurtosis: | 15.8 |
Or printing them with formatting.
for(i in 1:dim(stats)[1])
cat(stats[i,1],rep("",10-nchar(stats[i,1])),stats[i,2],"\n")
mean: 0.029%
sd: 1.22%
skewness: -0.5
kurtosis: 15.8
10.3.2 Jarque-Bera
The Jarque-Bera (JB) test in the tseries
library uses the third and fourth central moments of the sample data to check if it has a skewness and kurtosis matching a normal distribution.
jarque.bera.test(sp500$y)
Jarque Bera Test
data: sp500$y
X-squared = 34398, df = 2, p-value < 2.2e-16
The p-value presented is the inverse of the JB statistic under the CDF of the asymptotic distribution. The p-value of this test is virtually zero, which means that we have enough evidence to reject the null hypothesis that our data has been drawn from a normal distribution.
10.3.3 Ljung-Box
The Ljung-Box (LB) test checks if the presented data exhibit serial correlation. We can implement the test using the Box.test()
function and specifying type = "Ljung-Box"
:
Box.test(sp500$y, type = "Ljung-Box")
Box.test(sp500$y^2, type = "Ljung-Box")
Box-Ljung test
data: sp500$y
X-squared = 81.253, df = 1, p-value < 2.2e-16
Box-Ljung test
data: sp500$y^2
X-squared = 551.08, df = 1, p-value < 2.2e-16
10.3.4 Autocorrelation
The autocorrelation function shows us the linear correlation between a value in our time series with its different lags. The acf
function plots the autocorrelation of an ordered vector. The horizontal lines are confidence intervals, meaning that if a value is outside of the interval, it is considered significantly different from zero.
par(mar=c(4,4,3,0))
acf(sp500$y, main = "Autocorrelation of returns")
The plots made by the acf function don’t not look all the nice. While there are other versions available for R, including one for ggplot
, for now we will simply modify the builtin acf()
. Call that myACF()
and put it into functions.r
.
=function (x, n = 50, lwd = 2, col1 = co2[2], col2 = co2[1], main = NULL)
myACF
{= acf(x, n, plot = FALSE)
a = qnorm((1 + 0.95)/2)/sqrt(sum(!is.na(x)))
significance_level barplot(a$acf[2:n, , 1], las = 1, col = col1, border = FALSE,
las = 1, xlab = "lag", ylab = "ACF", main = main)
abline(significance_level, 0, col = col2, lwd = lwd)
abline(-significance_level, 0, col = col2, lwd = lwd)
axis(side = 1)
}par(mar=c(4,4,1,0))
myACF(sp500$y, main = "Autocorrelation of SP-500 returns")
myACF(sp500$y^2, main = "Autocorrelation of squared SP-500 returns")
10.3.5 QQ-Plots
We can use the quantile-quantile (QQ) plot to see how our data fits particular distributions. To make the QQ plots, we will use the qqPlot()
function from the car
package:
Start with the normal plot, indicated by distribution = "norm"
which is the default.
par(mar=c(4,4,1,0))
=qqPlot(sp500$y, distribution = "norm", envelope = FALSE) x
We can also use QQ-Plots to see how our data fits tthe t distributions with various degrees of freedom.
par(mfrow=c(2,2), mar=c(4,4,1,0))
=qqPlot(sp500$y, distribution = "norm", envelope = FALSE,xlab="normal")
x=qqPlot(sp500$y, distribution = "t", df = 4, envelope = FALSE,xlab="t(4)")
x=qqPlot(sp500$y, distribution = "t", df = 3.5, envelope = FALSE,xlab="t(3.5)")
x=qqPlot(sp500$y, distribution = "t", df = 3, envelope = FALSE,xlab="t(3)") x
10.4 Exercise
Using data.Rdata
, compute descriptive statistics for all stocks in the Return
data frame. Summarize and present the results in a new data frame. It should include mean, standard deviation, skewness, kurtosis, and the p-value of Jarque-Bera test. Keep five decimal places.
# Answer
=c()
rstfor (i in 2:ncol(data$Return)){
=data$Return[,i]
tmp=jarque.bera.test(tmp)
jbtest=c(mean(tmp), sd(tmp), skewness(tmp), kurtosis(tmp), jbtest$p.value)
temp.rst=rbind(rst, temp.rst)
rst
}=round(rst, digits=5)
rstcolnames(rst)=c("Mean", "Std Dev", "Skewness", "Kurtosis", "JB Test")
rownames(rst)=colnames(data$Return)[-1]
rst
Mean | Std Dev | Skewness | Kurtosis | JB Test | |
---|---|---|---|---|---|
AAPL | 0.00126 | 0.02122 | -0.10078 | 8.33528 | 0 |
DIS | 0.00037 | 0.01756 | 0.21869 | 12.56880 | 0 |
GE | -0.00006 | 0.02043 | -0.07300 | 12.06136 | 0 |
INTC | 0.00019 | 0.01977 | -0.44631 | 11.96508 | 0 |
JPM | 0.00044 | 0.02285 | 0.26633 | 20.63585 | 0 |
MCD | 0.00066 | 0.01333 | 0.11484 | 19.03846 | 0 |