Organizing, Visualizing, and Describing Data Flashcards
Continuous data
- can take on any numerical value in a specified range of values
- ex. future value
Discrete data
- number has a limited number of values.
- ex. monthly = 12, quarterly = 4, etc
Nominal data (2)
AKA quantitative data
- continuous
- discrete
Categorical data (2)
aka qualitative data
- describe a quality or characteristic of a group of observations
- nominal data
- ordinal data
Nominal data
- grouping names
- cannot be organized in a logical order
- ex. classifying stocks into different sectors, such as energy, information tech, etc
Ordinal data
- can be organized in logical order or ranked
ex. rating of mutual funds with the worst performance - there is an order, but can’t distinguish values of magnitude
Time-series data
- observations of 1 subject taken at specif and equal spaced intervals of time
ex quarterly returns of Apple 2019-2020
Cross-sectional data
- observations of multiple subjects taken at specific points in time
ex. 2019 Q1 quarterly returns of a group of simial stocks
Panel data
- presented as a table
- groups observations through time on one or more variables for multiple subjects
- quarterly returns for MSFT, Orcal, and HP from 2019 - 2020
One-Dimensional array
- one row of data
- a single variable - closing price of a stock on x day
Two-dimensional array
- consists of columns and rows to hold multiple variables and multiple observations
- a firm’s quarterly revenue, EPS, and DPS for past two years
Tree-map
- graphical tool to display categorical data
-
Arithmetic Mean
- simple mean
- the center of gravity of a data set
- sensitive to extreme values (outliers)
- appropriate for forecasting single period returns and expected returns
Sample mean
- arithmetic mean of a sample
- ^x sample mean
- mue (^m) population mean
Winsorized mean
- a way of dealing with outliers
- a 95% winsorized mean takes the bottom 2.5% off and the top 2.5% off
Median of even number of observations
- n = 4
- 3, 9, 10, 20: take value 2&3 and add then / 2
- (9+10)/2
Geometric Mean
- used to calculate the average return of an investment
- represents the growth rate of an investment
- represents the compound rate of return of an investment
- appropriate to measure past performance over multiple periods
= [(1+r)(1+r2)(1+rn)]^1/n -1
Harmonic mean
- used to find average purchase price for equal periodic investments
= n / sum of 1/xi
3 years / (1/$10) + (1 / $15) + (1/$20) = $13.85
Relationship of Geometric mean to the arithmetic mean
-geo mean will always be less than arithmetic mean
Quantiles:
quartiles, quintiles, deciles, percentiles
formula for the position of a percentile in a data set
4 quarters, 5 quintiles, 10 deciles, hundredths
- arrange data in ascending order (low to high)
= Ly = (n + 1) * (y / 100)
When to use each mean:
a. Arithmetic mean
b. Geometric mean
c. Weighted mean
d. Harmonic mean
e. Trimmed mean
f. Winsorized mean
a. with single period or cross-sectional data
b. with time-series data
c. when different observations have different weights
d. find avg purchase price for equal periodic investments
e. when data has extreme outliers
f. when the data has extreme outliers
Interquartile range:
- the difference between the third and first quartiles
List the Measures of Dispersion
- range
- Mean Absolute Deviation
- Variance (population, sample)
- Standard Deviation (population, sample)
Range formula
= max value - min value
Mean Absolute Deviation formula
= |xi - ^x| / n
- calculate the mean, then - each value from ^x. Total the absolute deviations and / n
Variance
- population O^2
- Sample S^2
- “the average of the squared deviations around the mean”
- use cal function: 2nd, data, to solve.
- need to ^2 either the sample of population deviation
Standard deviation
- “the positive square root of the variance”
- Population O
- Sample S
- use cal function: 2nd, data, to solve.
Target deviation / target semideviation def
- the risk of being below a given target
- only includes the observations below the target (B)
Target deviation / target semideviation formula
= sqrt root (sum squared deviations below the target / n-1)
- sqrt root ((xi - B)^2 / (n-1))
- ie if the target is 4%. find all observations < 4. Subtract observation from 4% and ^2. Sum all observations and / n - 1 then sqrt root
Coefficient of Variation def
- expresses how much dispersion exists relative to the mean
- used in investment analysis to compare relative risk
- lower value is less risky
Coefficient of Variation formula
= CV = S / ^x
- sample standard deviation / sample mean
Properties of a Normal Distribution
- “symmetrical distribution”
- mean = median = mode
- completely described by the mean and variance
- skewness = 0
- Kurtosis = 3 (excess kurtosis = 0)
Properties of a Positively skewed distribution
- has a long tail on the right side, peak to the left
- limited but frequent downside returns and unlimited but less frequent upside returns
- ie buying calls
- mean > median > mode
- positive skewness (>0)
- visually, the Mode is at the peak, the median is to its right and down the slope, the mean is further down the slope to the right
Properties of a Negatively skewed distribution
- has a long tail on the left side, peak to the right
- limited but frequent upside returns and unlimited but less frequent downside returns
- ie selling puts
- mean < median < mode
- negative skewness (<0)
- visually, the Mode is at the peak, the median is to its left and down the slope, the mean is further down the slope to the left
List the different Kurtosis’
- Leptokurtic
- Platykurtic
- Mesokurtic
Properties of Leptokurtic
- fatter tails
- more peaked
- excess kurtosis > 0
- k > 3
- probability of loss is higher
- visually, the peak is higher and the tails come down steeper and go out further. thus there is more data in the tails (fatter)
Properties of Platykurtic
- thinner tails
- less peaked
- excess kurtosis < 0
- K < 3
- visually the peak is lower
Properties of Mesokurtic
- identical to a normal distribution
- K = 3
Covariance def
- a measure of how to variables move together
- if positive, the 2 variables move up/down together
- if negative, the 2 variables move in opposite directions
Sample Covariance formula
= Cov = sum of (xi - ^x)*(yi - ^y) / n - 1
Correlation def
- a standardized measure of the linear relationship between to variables with values ranging between -1 and +1
- ie the strength of the relationship
Sample Correlation formula
= cov / sx * sy
cov of xy / sample stdv of x * sample stdv of y
Spurious correlation
- the correlation between town variables arising from their relation to a third variable.
- ie shoe size and vocabulary of school children
- the third variable is age