Module 3 Flashcards
Population vs sample mean
Symbols:
Σ: Sum
μ: Population mean
N: Population size
Sample variance uses n−1 in the denominator to ensure it is an unbiased estimator.
Measures of central location
Mean, median, mode
Median
Definition: The middle value of a dataset that divides it into two equal halves.
Key Point:
The median is useful when outliers are present.
If mean and median differ, outliers likely exist.
Mode
Definition: The value(s) occurring most frequently in a dataset.
Types:
Unimodal: One mode
Bimodal: Two modes
Multimodal: Three or more modes (less useful)
Examples:
Acetech salaries: $40,000 is the mode (most common).
Women’s sweatshirt sizes: “L” is the mode.
Differences between mean, median, and mode
- Use mean when data is symmetrical and free of outliers.
- Use median when data contains outliers or is skewed.
- Use mode for categorical variables or to identify most frequent values.
Weighted mean
A mean where different data points are assigned specific weights.
Useful when observations contribute unequally to the average.
Histograms
Purpose:
Visualizes data distribution, clustering, spread, and shape.
Key Characteristics:
Symmetric: Mirror image on both sides of the center.
Skewed:
Positive: Long right tail.
Negative: Long left tail.
Symmetric and Unimodal
Distribution:
Mean = Median = Mode.
Percentiles
Definition:
Divide data into 100 equal parts.
Specific percentiles:
25th Percentile: Q1
50th Percentile: Q2 (Median)
75th Percentile: Q3
Application:
Ideal for large datasets.
Used in five-number summaries (Minimum, Q1, Median, Q3, Maximum)
What is the five number summary?
Minimum, Q1, Median (Q2), Q3, Maximum.
Purpose:
Summarizes the spread and relative position of data.
Example: Growth and Value variables.
Measures of dispersion
Variance and standard deviation
Variance:
The average of squared differences between observations and the mean.
Units are squared.
Standard Deviation:
Square root of variance, returning to the original units.
Represents typical spread around the mean.
Importance of variance and standard deviation
Variance: Highlights how data points differ from the mean.
Standard Deviation: Provides a clear measure of spread in the same units as the data.
Use Cases: Evaluate consistency, risk, or variability in data (e.g., financial analysis).
The Empirical Rule
Bell-shaped distribution:
68% of data within 1 standard deviation of the mean.
95% within 2 standard deviations.
99.7% within 3 standard deviations.
Z-scores for Outlier Detection
Measures how many standard deviations a value is from the mean.
For a symmetric, bell-shaped distribution, outliers have Z-scores less than -3 or greater than +3.
What is covariance?
Measures the degree to which two variables change together.
What is correlation?
Standardized measure of covariance, ranging from -1 to 1.