1.2 Organizing, Visualing, and Describing Data Flashcards
Numerical data (a.k.a. quantitative data)
Values that represent measured/counted quantities
two types of numerical data
continuous data and discrete data
continuous data
Data may take on any numerical value in a specified range of values
ex: Any value between 0 and 1 (Infinite number of possibilities)
discrete data
Data may only take on a countable number of values
ex: 0, 0.5, and 1 (Only 3 possible values)
Categorical data (a.k.a. qualitative data)
Values that describe the characteristic of a group of observations
For example, companies can be classified into bankrupt vs. not bankrupt
two types of categorical data
nominal data and ordinal data
what is a variable
Characteristic/quantity that can be measured and is subject to change (e.g., stock price)
what is an Observation
A value of the variable that is collected (e.g., stock price yesterday was $30)
Cross-sectional data
observations that capture characteristics of different units at a specific point in time
An example of this is a list that shows the current dividend yields of different FTSE 100 companies.
Time-series data
observations of the same unit at different points in time
An example of this is a list that shows the dividend yield of an FTSE 100 company over the past 10 years
Panel data
a mix of time-series and cross-sectional data
Structured data
highly organized in a pre-defined manner with repeating patterns
They are relatively easy to store, search, and analyze
Common examples of structured data include market data and fundamental data stored in Excel databases
unstructured data
do not follow any conventionally organized forms
They typically require manual processing prior to being analyzed by financial models
Common examples of unstructured data include text (from financial news), audio, video, and photo
helped byalternative data
alternative data
the data generated through unconventional sources (e.g., individual social media posts, satellite imagery, etc.), drives the availability and importance of unstructured data
Raw data
data available in the original form as collected
They normally cannot be used directly to extract information