550 Flashcards
LOM - Nominal
Characteristics: Categorical
Math: Equality (=, !=)
Central Tendency: Mode
Variability: None
LOM - Ordinal
Characteristics: Categorical, Rank Order
Math: Equality (=, !=), Comparison (>,<)
Central Tendency: Mode, Median
Variability: Range, Interquartile Range
LOM - Interval
Characteristics: Categorical, Rank Order, Equal Spacing
Math: Equality (=, !=), Comparison (>,<), Add/Subtract (+/-)
Central Tendency: Mode, Median, Arithmetic Mean
Variability: Range, Interquartile Range, Standard Deviation, Variance
LOM - Ratio
Characteristics: Categorical, Rank Order, Equal Spacing, True Zero
Math: Equality (=, !=), Comparison (>,<), Add/Subtract (+/-), Mult/Div (x /)
Central Tendency: Mode, Median, Arithmetic Mean, Geometric Mean
Variability: Range, Interquartile Range, Standard Deviation, Variance, Relative Standard Deviation
LOM - Nominal Numeric
Non-numeric categories coded as numeric are not really numbers and have no quantitative meaning i.e. T = 0, F = 1
5 Number Summary
Minimum
1st Quartile - 25%
Median
3rd Quartile - 75%
Maximum
Displayed using Boxplots
Range vs IQR
Range is highly influenced by outliers
Inner Quartile Range is resistant to outliers
Based on the 1st and 3rd quartile
High Outlier > Q3 + 1.5IQR
Low Outlier < Q1 - 1.5IQR
Standard Deviation
Average distance from the mean value of all values in a set of data.
Smallest is 0.
Sensitive to outliers and skew.
Square root of the sum of the difference between each value and the mean squared divided by the total number of values.
Measures of Central Tendency
Normal aka No Skew: Mean = Median = Mode
Left Skewed aka Right Hump: Mean < Median < Mode
Right Skewed aka Left Hump: Mode < Median < Mean
Normal Distribution + Empirical Rule
Bell-shaped, unimodal, symmetrical distribution of a quantitative variable with mean=median=mode.
68% within 1 standard deviation
95% within 2 standard deviations
99.7% within 3 standard deviations
Z-Score
A standardized score that measures how many standard deviations a data point is from the mean of a group.
Z = (value - mean)/sd
0 is equal to the mean. 1 is equal to 1 sd.
Kurtosis
Normal curve is 3 or 0
Thin pointy curve is >3 or (+)
Flat and spread out is <3 or (-)
Descriptive vs Inferential Statistics
Numbers that describe the data set ex. batting average
vs
Using confidence intervals and significance tests to make inferences about a population from a sample ex. how likely a player is to perform well in the future
Mean vs Median vs Trimmed Mean
Balance point of the distribution, sensitive to extreme values.
Equal areas point , resistant to extreme values.
Calculate the average by removing a certain percentage of the highest and lowest values.
Histogram
Box and Whisker Plot
Dotplot/Stemplot
Bar Graphs
Good to visualize the shape of a large amount of data that is integer or ratio
Useful for showing the distribution of data
Best for small sets of quantitative data
For categorical data
Probability vs Statistics
Probability relates to how often different events occur
- We know the model aka conditions but we don’t know the data
Statistics we know the data but we don’t know the model aka conditions
- The core of inferential statistics is figuring out the model
Probability Distribution
Outcomes of a trial must be disjoint aka mutually exclusive aka can’t occur at the same time.
Probabilities must be between 0 and 1.
Probabilities must sum to 1.
Distribution
A function that shows the possible values for a variable and how often those values occur
Discrete
- Poisson
- Binomial
- Uniform
- Geometric
Continuous
- F
- Uniform
- Normal
- Chi-Square
- T
Random Variables
A numerical value that depends on the outcome of a chance experiment.
Random varaible T = number of tails occuring in two tosses.
T is a random variable since it has numerical values (0, 1, 2) and it is based on a random process
- Coin Toss
Discrete vs Continuous Random Variables
Discrete:
- Isolated points along # line
- # of items purchased, # of customers on website
- Counting
- Histograms
Continuous:
- All points in some interval
- Temperature of a freezer, weight of a pineapple
- Measuring
- Density Curve
Uniform Distribution (Discrete/Continuous)
Distribution in which all outcomes are equally likely
Discrete
- Count
- Simple random sample from a population or to model events that are equally likely such as die rolling
Continuous
- Measure