2 Financial and economic data
While financial markets generate a vast amount of data (sometimes with new observations every nanosecond), it is generally difficult and costly to get this data. Most data providers are commercial with complicated interfaces, sometimes very expensive, and the few free data providers tend to have erratic access and errors in data.
2.1 Symbols and names and identifiers
There are many categories of financial data one might use, like stocks, bonds, futures, options, commodities and foreign exchange. For each of these, we have a large number of individual assets and types of assets. Identifying the asset we need can be quite complex.
2.1.1 Names and tickers
All stocks that are trading in the market are associated with a ticker symbol, serving as an identifier for the specific security. These are often specific to the particular exchange or country of listing. This can often lead to confusion or ambiguity when a company has cross-listings. For instance, the Japanese car manufacturer Toyota is listed as 7203 on the Tokyo Stock Exchange, TYT on the London Stock Exchange, and TM on the New York Stock Exchange.
Some identical securities have different names. Depending on the source, searching for data on the S&P 500 (a major American stock market index) might require the label SPX (Bloomberg), ^GSPC (Yahoo Finance), INX (Google Finance), GSPC.INDX (EOD), and so on.
Furthermore, firms sometimes change names, so tickers often change over time, for example, due to a merger or name change, and can be recycled. The same ticker can refer to two or more firms, giving rise to problems if not taken into consideration when querying financial data.
When searching for a particular security from the data source, it is good practice to verify that the ticker symbol used indeed corresponds to the correct data series by checking the data description.
2.1.2 Permanent assets identifiers — Tickers, ISIN and PERMNO
To make finding and using data more reliable, data vendors generally have what is known as permanent assets identifiers, which is a label that stays with the same asset no matter what else might change about it. Of course, there are multiple data vendors, each with their own nomenclature and usually, inconsistent labelling.
The ISIN is the International Securities Identification Number, an internationally recognised 12-character code unique to each stock. Unlike Tickers, the ISIN for a stock is the same regardless of the market.
The PERMNO is the permanent issue identifier of the CRSP dataset.
When downloading data, it is best not to refer to stocks by their ticker symbol but rather by one of the permanent asset identifiers.
2.1.3 Adjusted and unadjusted prices
If you download stock prices, your data will often be unusable because the number of outstanding stocks is often adjusted, usually by stock splits.
For example, Amazon announced a 20-for-1 stock split in 2020. This meant that every Amazon stock became 20 stocks, and similarly, the price dropped by a factor of 19, from about $2000 to $124.
This means that if you load such price data into R and do an analysis, you would have a big price drop that has no impact on risk or wealth.
We, therefore, work with what is called adjusted prices, that is, prices adjusted for stock splits.
2.1.4 Asynchronous prices
Problems arise when data comes from different markets and countries due to:
- Holidays;
- Time zones.
Public holidays, days when the markets are closed, are often different across countries. For example, an Independence Day or religious holiday, like 4 July in the United States. The exchanges are usually open Monday through Friday, but the Saudi Stock Exchange is open from Sunday through Thursday. Some exchanges close for a lunch break. Some countries have summertime, and others do not, and summertime often happens on different dates, like in the US vs. Europe.
| Name | Time Zone | Trading Hours | Lunch Break |
|---|---|---|---|
| New York Stock Exchange | EDT | 9:30 a.m. to 4:00 p.m. | No |
| Shanghai Stock Exchange | CST | 9:30 a.m. to 11:30 a.m., 1:00 p.m. to 3:00 p.m. | 11:30 a.m. to 1:00 p.m. |
| Tokyo Stock Exchange | JST | 9:00 a.m. to 3:00 p.m. | 11:30 a.m. to 12:30 p.m. |
| London Stock Exchange | BST | 8:00 a.m. to 4:30 p.m. | 12:00 p.m. to 12:02 p.m. |
| Frankfurt Stock Exchange | CET | 9:00 a.m. to 5:30 p.m. | No |
The New York market might overlap with London but not some European markets, and since Tokyo is 12 hours ahead of London, there is no overlap in trading hours.
This means that any research comparing prices across countries at a daily frequency needs to consider these issues. They can be bypassed by using weekly or monthly data.
2.2 Data quality and validation
Data quality is fundamental for accurate risk measurement. Poor quality data can lead to incorrect risk calculations, failed regulatory compliance, and substantial financial losses. Common data quality issues include:
2.2.1 Missing values and gaps
Financial markets sometimes have trading halts, holidays, or technical issues that create gaps in price series. Missing data points can distort volatility calculations and risk metrics. Always check for:
- Consecutive missing dates
- Unusual gaps in trading volumes
- Zero or negative prices (often indicating data errors)
2.2.2 Corporate actions errors
Stock splits, dividends, and spin-offs should be properly reflected in adjusted prices. Unadjusted data often contains artificial price jumps that can severely distort risk calculations. Signs of corporate action errors include:
- Sudden large price drops (often missed stock splits)
- Price series that do not align with market indices during the same period
- Unrealistic volatility spikes on specific dates
2.2.3 Outliers and data entry errors
Erroneous data points can dramatically affect risk measurements. Common issues include:
- Prices quoted in wrong currency units (pence vs pounds)
- Decimal point errors (prices off by factors of 10 or 100)
- Typographical errors in manual data entry
2.2.4 Basic validation procedures
Before using data for risk calculations:
- Visual inspection: Plot price series to identify obvious anomalies
- Return distribution analysis: Check for unusually large returns that may indicate data errors
- Cross-validation: Compare data across multiple sources when possible
- Corporate action verification: Ensure adjusted prices align with known stock splits and dividends
- Volume consistency: Verify that zero prices do not coincide with normal trading volumes
Poor data quality has led to significant losses in quantitative finance. Always implement systematic data validation procedures before conducting risk analysis.
2.3 Data frequency considerations
Financial data is available at different frequencies, each serving specific risk management purposes. The choice of data frequency significantly affects risk calculations and computational requirements.
Tick data captures every trade or quote update, potentially generating millions of observations per day. This ultra-high frequency data is useful for high-frequency trading algorithms, market microstructure analysis, intraday risk monitoring and transaction cost analysis. However, tick data presents substantial challenges in terms of storage requirements, which can reach terabytes for a single year of data.
Minute and hourly data represent aggregated versions of tick data and serve as a practical middle ground. This frequency is commonly used for intraday volatility estimation, real-time risk monitoring, day trading strategies and market timing models. The reduced data volume makes it more manageable while still capturing important intraday patterns.
Daily data, consisting of end-of-day closing prices, remains the most widely used frequency for portfolio risk assessment, regulatory reporting, academic research and long-term strategy development. Daily data strikes an optimal balance between capturing meaningful price movements and maintaining reasonable computational requirements.
Weekly and monthly data provide further aggregation that is particularly useful for long-term analysis and helps avoid asynchronous trading issues that arise when comparing markets across different time zones and holiday schedules.
The practical considerations of higher frequency data extend beyond storage to include processing power requirements for real-time analysis, substantially higher costs for data acquisition, and increased complexity in data cleaning and synchronisation across multiple markets and instruments.
Data frequency has a direct impact on risk measurements. Higher frequency data captures intraday volatility patterns that daily data might miss, potentially revealing important risk events that occur within trading sessions. However, very high frequency data can introduce noise that obscures genuine risk signals. Daily data may miss important intraday risk events but provides a cleaner signal for longer-term risk assessment.
The optimal frequency depends on the investment horizon and specific risk management objectives. For most portfolio risk management applications, daily data provides an appropriate balance between accuracy and practicality, while institutions engaged in intraday trading require higher frequency data despite the associated challenges.
Derivatives data, including options and futures, represents an important component of comprehensive risk management that extends beyond traditional equity analysis. Most major data vendors provide derivatives data alongside equity data, though this information requires additional considerations such as expiration dates, strike prices, contract specifications and roll dates for futures. Options data often includes calculated Greeks and implied volatility measures that are helpful for portfolio risk assessment. The complexity of derivatives data means that proper risk management systems must account for these additional dimensions when calculating portfolio exposures and hedging requirements.
Foreign exchange data is important for portfolio risk management when dealing with international investments, as currency movements can affect returns. Most data vendors provide FX coverage including spot rates, forward rates and cross-currency pairs across major and emerging market currencies. FX risk affects both direct currency exposures and the valuation of foreign assets. Currency hedging strategies require access to both spot and derivatives data, while multi-currency portfolios need consistent base currency conversion methodologies to aggregate risk measures across different markets.
Alternative data sources have become more common in risk assessment, supplementing traditional financial metrics with information from news sentiment, social media, satellite imagery and ESG ratings. These data types can sometimes provide early signals of potential risk events, though they require careful validation and present challenges in terms of standardisation and reliability. Many vendors now offer alternative data products alongside their core financial data services, though the integration of such information into systematic risk models remains an area of ongoing development.
2.4 Common sources of financial and economics data
The type of data we use here can only be obtained from a commercial vendor, either for free or by paying. Your university might have a subscription to a commercial vendor that you can use for free.
2.4.1 What we usually use
2.4.1.1 EOD historical data
Our primary source of financial prices is EODH, which provides fundamental data API and live and end-of-day historical prices for stocks, ETFs, and mutual funds from exchanges worldwide. While not free, it is not very expensive and comes with an academic discount.
For R users, we recommend using the eodhdR2 package, which provides a simple interface to access stock prices, dividends and splits directly from R. More information can be found at the EOD R library documentation.
2.4.1.1.1 Installation and Setup
The eodhdR2 package relies on several supporting packages that need to be installed first:
# Install supporting packages
install.packages(c("httr",
"jsonlite",
"lubridate",
"data.table",
"readr"),
repos = "https://cloud.r-project.org/"
)
# Install the EOD Historical Data R package
install.packages('eodhdR2', repos = "https://cloud.r-project.org/")
# Load the package
library(eodhdR2)
2.4.1.1.2 API Token Setup
An API token is like a digital key that identifies you to the EODH service. The token is a unique string of characters that tells the server who you are and what data you are allowed to access.
For course students, you will receive an invitation to register. Once registered, your plan will be activated automatically and an API key will be generated in your account. For others, register at eodhd.com for a free or paid account.
There are two main options for API access:
- Personal API key: Full access to all functions including splits data
- Demo key: Limited to basic price and dividend data only
# Option 1: If you have a personal API key
token = "YOUR_API_KEY_HERE"
# Option 2: If using demo (limited functionality)
token = "demo"
# Set your token
set_token(token)
# Check available functions
help(get_prices)
2.4.1.1.3 Downloading Data
With EODH, we can download data directly in R using ticker symbols. The get_prices() function takes two main parameters: the stock ticker symbol and the exchange code. For US stocks, we use “US” as the exchange:
# Download Apple stock data
prices = get_prices("AAPL", "US")
# Explore the data structure
dim(prices)
head(prices)
names(prices)
# The data includes: date, open, high, low, close, adjusted_close, volume
summary(prices$adjusted_close)
EODH also provides dividend and split information:
# Get dividend data for Apple
dividends = get_dividends("AAPL", "US")
# Get stock splits (requires personal token, not available with demo)
splits = get_splits("AAPL", "US")
The data we use in these notes was obtained with permission from EOD.
2.4.1.2 DBnomics
Our main source of economic data used to be is DBnomics, which aggregates free data feeds from various sources, including the World Bank, IMF, BIS, and OECD.
However, it does not appear to be updated anymore.
It comes with both a browser and an API interface. For R, we can use rdbnomics.
require(rdbnomics)
df1 = rdb(ids = "AMECO/ZUTN/EA19.1.0.0.0.ZUTN")2.4.2 Some other vendors
2.4.2.1 Bloomberg
One of the most ubiquitous data sources in finance is the Bloomberg Terminal. Due to its pervasiveness throughout the industry, there are numerous packages in practically every language that allow access to its APIs. We can download Bloomberg data directly into R.
LSE students have access to a number of Bloomberg terminals in the library and the master students’ common rooms.
2.4.2.2 Wind
The Wind Financial Terminal (WFT) also provides market data like the Bloomberg Terminal but with a specific focus on the Chinese financial markets. It supports APIs for MATLAB, R, C++ and Python, among others. LSE has access to Wind.
2.4.2.3 WRDS
The Wharton business school at the University of Pennsylvania provides a service called Wharton Research Data Services (WRDS) that many universities subscribe to. This provides a common interface to several databases, including CRSP and TAC high-frequency data. WRDS and many of its databases are available to LSE students and staff.
2.4.2.4 Yahoo Finance
The go-to place for many researchers requiring financial data has been finance.yahoo.com. This data can be automatically downloaded for free into many software packages, including Matlab, R and Python.
There are three problems with Yahoo Finance:
- Yahoo occasionally changes how the API works, requiring updates to software;
- It often is unavailable for days or weeks;
- There are errors in the data. For example, UK prices, quoted in pence by convention, sometimes appear in pounds for one or two days, reverting to pence. On other occasions, numbers need to be corrected.
2.4.2.5 Federal Reserve Economic Data (FRED)
The FRED Economic Research is a good source for macroeconomic data, including unemployment, GDP, interest rates, the money supply, etc. It can be accessed from DBnomics.
2.4.2.6 IEX
IEX provides access to US equity data via https://iextrading.com/developer/.
2.4.2.7 ECB FX
The European Central Bank Statistical Data Warehouse and its corresponding SDMX interface allow for retrieval of daily Euro FX data.
The entire dataset is here http://www.ecb.europa.eu/stats/eurofxref/eurofxref-hist.zip, and it can be accessed using
wget http://www.ecb.europa.eu/stats/eurofxref/eurofxref-hist.zip -O eurofxref-hist.zip
EODH also has this data.
2.4.2.8 Alpha Vantage
Alpha Vantage provides free daily and real-time stock price data and API access is available in R and Python. Its data source appears to be the same as Yahoo Finance and therefore is subject to the same errors.
2.4.2.9 Quandl
Quandl provides R and Python with common API access to a large number of commercial databases, some of which are free. While comprehensive, one may need to subscribe to data from several providers.
2.4.2.10 Fama-French Data Library:
The Fama-French Data Library provides a large amount of historical data compiled by Eugene Fama and Kenneth French. The data is updated regularly, and the Fama-French 3-factor data is especially useful for analysing fund and portfolio performance.
2.4.3 Other useful databases
Some useful databases that can be accessed in different ways include:
2.4.3.1 CRSP
One of the major sources for historical US stock market data is The Center for Research in Security Prices (CRSP, pronounced “crisp”), headquartered at the University of Chicago.
CRSP can be accessed via WRDS.
2.4.3.2 BIS
The BIS provides a very useful database on credit and banking statistics. You can access it directly at stats.bis.org.
2.4.3.3 World Bank
The World Bank provides extensive economic development data.
2.4.3.4 OECD
The OECD provides extensive data on member states.