Statistics and Distributions Flashcards
Distributions
- Representation of the way values tend to vary across a single attribute
- Usually presented as a histogram
- Where is the data concentrated? Which values are less likely? Which is most likely?
Which single value best represents the data?
Central Tendency
Context dependent
- On a histogram: affects the location on the x-axis
Mean
arithmetic mean:
sum of values/number of values
Median
Middle value of sorted data
- Resistant to outliers and skew
Variability
How far does the data spread away from the mean?
Affects the width of the histogram
Standard Deviation
This is the average distance from the mean
If we pick a random value from the data, how far should we expect it to be from the mean?
sd = sqrt(sum(x-mu)^2 / N)
Percentiles and Quartiles
25th Percentile : 1st Quartile
50th Percentile : 2nd Quartile
75th Percentile : 3rd Quartile
IQR and Outliers
Interquartile Range : Q3-Q1
Lower/Upper Fences: [Q1 - (3/2) * IQR, Q3 - (3/2) * IQR]
Outlier: A value that falls outside of the fences.
Boxplots
Excellent tool to display and compare measures of variability
They display:
- Median
- IQR
- Fences
- Outliers
- Range
Normal Distribution
- Gaussian Distribution or Bell Curve
Fundamental to statistics
Countless occurrences in nature
Has a number of useful properties
Normal Distribution Properties
- Symmetric
Mean = Median = Mode - 68-95-99 Rule
- Foundation of the Central Limit Theorem
Random Experiment
A process that results in an outcome
Outcome
The value of the result of a single experiment
Sample Space
The set of all possible outcomes for an experiment
Event
A subset of the sample space