550 Flashcards
LOM - Nominal
Characteristics: Categorical
Math: Equality (=, !=)
Central Tendency: Mode
Variability: None
LOM - Ordinal
Characteristics: Categorical, Rank Order
Math: Equality (=, !=), Comparison (>,<)
Central Tendency: Mode, Median
Variability: Range, Interquartile Range
LOM - Interval
Characteristics: Categorical, Rank Order, Equal Spacing
Math: Equality (=, !=), Comparison (>,<), Add/Subtract (+/-)
Central Tendency: Mode, Median, Arithmetic Mean
Variability: Range, Interquartile Range, Standard Deviation, Variance
LOM - Ratio
Characteristics: Categorical, Rank Order, Equal Spacing, True Zero
Math: Equality (=, !=), Comparison (>,<), Add/Subtract (+/-), Mult/Div (x /)
Central Tendency: Mode, Median, Arithmetic Mean, Geometric Mean
Variability: Range, Interquartile Range, Standard Deviation, Variance, Relative Standard Deviation
LOM - Nominal Numeric
Non-numeric categories coded as numeric are not really numbers and have no quantitative meaning i.e. T = 0, F = 1
5 Number Summary
Minimum
1st Quartile - 25%
Median
3rd Quartile - 75%
Maximum
Displayed using Boxplots
Range vs IQR
Range is highly influenced by outliers
Inner Quartile Range is resistant to outliers
Based on the 1st and 3rd quartile
High Outlier > Q3 + 1.5IQR
Low Outlier < Q1 - 1.5IQR
Standard Deviation
Average distance from the mean value of all values in a set of data.
Smallest is 0.
Sensitive to outliers and skew.
Square root of the sum of the difference between each value and the mean squared divided by the total number of values.
Measures of Central Tendency
Normal aka No Skew: Mean = Median = Mode
Left Skewed aka Right Hump: Mean < Median < Mode
Right Skewed aka Left Hump: Mode < Median < Mean
Normal Distribution + Empirical Rule
Bell-shaped, unimodal, symmetrical distribution of a quantitative variable with mean=median=mode.
68% within 1 standard deviation
95% within 2 standard deviations
99.7% within 3 standard deviations
Z-Score
A standardized score that measures how many standard deviations a data point is from the mean of a group.
Z = (value - mean)/sd
0 is equal to the mean. 1 is equal to 1 sd.
Kurtosis
Normal curve is 3 or 0
Thin pointy curve is >3 or (+)
Flat and spread out is <3 or (-)
Descriptive vs Inferential Statistics
Numbers that describe the data set ex. batting average
vs
Using confidence intervals and significance tests to make inferences about a population from a sample ex. how likely a player is to perform well in the future
Mean vs Median vs Trimmed Mean
Balance point of the distribution, sensitive to extreme values.
Equal areas point , resistant to extreme values.
Calculate the average by removing a certain percentage of the highest and lowest values.
Histogram
Box and Whisker Plot
Dotplot/Stemplot
Bar Graphs
Good to visualize the shape of a large amount of data that is integer or ratio
Useful for showing the distribution of data
Best for small sets of quantitative data
For categorical data