1.2 Organizing, Visualing, and Describing Data Flashcards
Numerical data (a.k.a. quantitative data)
Values that represent measured/counted quantities
two types of numerical data
continuous data and discrete data
continuous data
Data may take on any numerical value in a specified range of values
ex: Any value between 0 and 1 (Infinite number of possibilities)
discrete data
Data may only take on a countable number of values
ex: 0, 0.5, and 1 (Only 3 possible values)
Categorical data (a.k.a. qualitative data)
Values that describe the characteristic of a group of observations
For example, companies can be classified into bankrupt vs. not bankrupt
two types of categorical data
nominal data and ordinal data
what is a variable
Characteristic/quantity that can be measured and is subject to change (e.g., stock price)
what is an Observation
A value of the variable that is collected (e.g., stock price yesterday was $30)
Cross-sectional data
observations that capture characteristics of different units at a specific point in time
An example of this is a list that shows the current dividend yields of different FTSE 100 companies.
Time-series data
observations of the same unit at different points in time
An example of this is a list that shows the dividend yield of an FTSE 100 company over the past 10 years
Panel data
a mix of time-series and cross-sectional data
Structured data
highly organized in a pre-defined manner with repeating patterns
They are relatively easy to store, search, and analyze
Common examples of structured data include market data and fundamental data stored in Excel databases
unstructured data
do not follow any conventionally organized forms
They typically require manual processing prior to being analyzed by financial models
Common examples of unstructured data include text (from financial news), audio, video, and photo
helped byalternative data
alternative data
the data generated through unconventional sources (e.g., individual social media posts, satellite imagery, etc.), drives the availability and importance of unstructured data
Raw data
data available in the original form as collected
They normally cannot be used directly to extract information
the first step to be able to use raw data
usually to organize them into a one-dimensional array or two-dimensional array
one-dimensional array
suitable for a single variable
For example, a one-dimensional array can be built to show the annual return of the S&P 500 index for the past 10 years.
–> This is appropriate because the annual return is the only variable that needs to be evaluated
A two-dimensional rectangular array (or data table)
used to analyze multiple variables
For example, in addition to the annual return, a data table can also show the dividend yield and earnings yield of the S&P 500 index for the 10-year period
An analyst uses a software program to analyze unstructured data—specifically, management’s earnings call transcript for one of the companies in her research coverage. The program scans the words in each sentence of the transcript and then classifies the sentences as having negative, neutral, or positive sentiment. The resulting set of sentiment data would most likely be characterized as:
A. ordinal data.
B. discrete data.
C. nominal data.
A. ordinal data.