Statistics + Probability (topic 4) Flashcards
Mean
Average: add up all values, divide by the number of terms.
Median
Middle value in an ordered data set. **Need to be in order first.
Mode
Most common number in the set.
Percentile
X percent of the data is below this.
Quartile
Q1 = first quartile = 25th percentile
Q2 = second quartile = 50th percentile
Q3 = third percentile = 75th percentile
Discrete data
Exact numbers (usually from counting)
Interquartile range
Measure of dispersion (spread) of the data
Continuous data
Any value in a certain range (can be decimal)
Reliable data
Repeatable data
Missing data can affect reliability
Bias: you have results favouring one outcome over another. **We try to minimize bias.
Sampling techniques
- Simple random
- Convenience
- Systematic
- Quota
- Stratified
Simple random sampling technique
Equal chance of choosing. Choose out of a hat, number generator, etc.
Ex. Poll students from school - # assigned to students, choose with a random # generator.
Convenience sampling technique
Choose easiest people to sample: ask your friends etc. Problems? May not be representative of population
Systematic sampling technique
Choose random starting point, use fixed interval.
Ex. Make a list of all students in a class, choose every 3rd student.
Quota sampling technique
Sample sizing to who you’re polling
Ex. 55% girls, 45% boys in school, so sample should have those same %.
Stratified sampling technique
Split into strata (smaller groups)
Ex. Choose half dp1, half dp2 students
Outliers
We might want to remove values from a data set if they fall too far outside criteria
Outlier = more than 1.5(IQR) from nearest quartile.
- less than Q1 - 1.5(IQR)
- more than Q3 + 1.5(IQR)
Standard deviation
How far values are from the mean.
Variance
The square of standard deviation
Frequency tables
Keeps track of not only how many times (frequency) it happens, but keeps a running total (cumulative frequency)
**Useful for finding percentiles, quartiles and median
Interpolation
Use the x value with the equation
Extrapolation
Use the y value with the equation
Conditional probability
The probability of an event happening given that another event already occured.
When to use binompdf
Tells you the probability of exactly “r” successes
binompdf (n, p, r)
where,
binompdf = exactly “r”
n = # of trials
p = probability of success
r = # of successes
When to use binomcdf
Tells you the probability of up to “r” successes
binomcdf (n, p, r)
where,
binomcdf = up to “r” (cumulative)
n = # of trials
p = probability of success
r = # of successes
Normal distribution curve
Area under curve
68% between μ ± σ
95% between μ ± 2σ
99.7% between μ ± 3σ
Inverse norms only tell you about the area to the left