From study guide posted Flashcards
What is data?
Raw facts or figures
What is statistics and what does it involve?
The study of data, involving:
Collecting data
Summarizing data
Drawing conclusions
Descriptive vs inferential statistics
Descriptive: Summarizes data.
Inferential: Makes predictions or inferences about a population from a sample.
What is population data?
Often not feasible to obtain due to size or time constraints.
Cross-sectional vs time series data
Cross-sectional: Data collected at one point in time.
Time series: Data collected over multiple time periods.
Structured vs unstructured data
Structured: Organized (e.g., in tables).
Unstructured: Disorganized (e.g., text, images).
What is a variable?
A characteristic that can take different values.
Categorical vs numerical data
Categorical: Qualitative (e.g., gender, type).
Numerical: Quantitative (e.g., height, age).
Discrete vs continuous data
Discrete: Countable (e.g., number of cars).
Continuous: Measurable (e.g., weight, height).
What are the four scales of measurement?
Nominal: Categories (e.g., gender).
Ordinal: Ordered categories (e.g., rankings).
Interval: Ordered with equal intervals but no true zero (e.g., temperature).
Ratio: Ordered with equal intervals and a true zero (e.g., weight).
How do you deal with missing data?
Omission: Remove missing values.
Imputation: Fill in missing values with estimates.
What is frequency distribution?
Summarizes data into categories or intervals.
How do you visualize categorical data?
Bar chart
Pie chart
Contingency table
Stacked column chart
How do you visualize numeric data?
Histogram: For continuous data.
Intervals: Calculate width by (Max - Min) / Number of intervals.
Skewness: Symmetric, Positive, or Negative.
What are three ways to measure central location?
Mean, median, and mode.
Mean (μ for population, x̄ for sample)
Median (especially useful for outliers)
Mode (Unimodal, Bimodal)
What is a box plot?
Graphical representation of the five-number summary.
What is the five-number summary?
Min, Q1, Median (Q2), Q3, Max.
What are the measures of dispersion?
Range: Max - Min.
Interquartile Range (IQR): Q3 - Q1.
Variance (s² for sample, σ² for population) and Standard Deviation (s for sample, σ for population).
Coefficient of Variation (CV): s/x̄ or σ/μ.
What is the empirical rule?
For normal distribution, 68%-95%-99.7%.
What is a sample space?
Set of all possible outcomes.
Example: S = {A, B, C, D, F} (letter grades).
Exhaustive vs mutually exclusive events
Exhaustive: Covers all outcomes.
Mutually Exclusive: Cannot occur simultaneously.
Compare union, intersection, and complement
Union (A ∪ B): Outcomes in A or B (or both).
Intersection (A ∩ B): Outcomes in A and B.
Complement (Aᶜ): Outcomes not in A.
What are the properties of probability?
0 ≤ P(A) ≤ 1.
Sum of probabilities of mutually exclusive events equals 1.
Types of counting
Factorials: For multistep experiments.
Combinations: Selection without order (n choose x).
Permutations: Selection with order.
Binomial distribution
Success or failure, fixed trials.
Poisson Distribution
Number of successes in a fixed time or space.
Normal Distribution
Bell-shaped curve, use cumulative distribution for probabilities.
Exponential Distribution
Time/space between events.
Expected Value
E(X) = μ = Σ(xᵢ * P(X = xᵢ)).
Variance and Standard Deviation
Variance = Σ(xᵢ - μ)² * P(X = xᵢ), SD = √Variance.
Confidence interval formula
CI = x̄ ± Z (σ/√n).