Lec 4 notes Flashcards
What does randomization do
What does randomization do?
1. removes selection bias
2. establish causality
sampling variability
Sampling variability
Population mean is 100
But from sample A,B,C; they are 101, 102, 99
The sample mean varies from sample to sample and does not come out exactly 100
Here, the x bar (mean) varies from sample to sample and that it did not come out to exactly 100 (the pop mean)
Sampling distribution
Sampling distribution
if you sample over and over, it zeroes in on what you are looking for (ie pop mean)
Sample distribution of the mean = pop mean
µX bar = mean of the sampling distribution of the random variable X bar
relationship b/w sample mean and population mean
unbiasedness
sample mean is a point estimate of the population mean
Unbiasedness: the mean of the sampling distribution is equal to the population parameter
standard error of the mean
- type of staistic
- is it data
- sample size and SE relationship
- SE is a measure of ???
standard error of the mean (sem) or standard error: it is NOT a descriptive stat; it is a summary statistics of the sampling distribution
It is NOT data b/c if has only ONE point in the data
The larger the sample size, the more precise our estimate of µ will be
Also, larger sample size -> smaller SE
SE indicates the variability of the estimate
bias
How can we reduce bias?
variability
How can we reduce variability?
Bias
unbiased if the mean of its sampling distribution is equal to the true value of the corresponding parameter in the pop; bias = opp
Randomization techniques reduce bias
Variability: Concerns the spread of the sampling distribution
o It is reduced by increasing the sample size, n
Central Limit Theorem
How big does n need to be?
Exception
Central limit theorem
the distribution of sample means approximates a normal distribution N(µ, σ2/n) [w/ mean µ and variance σ2/n] if we increase the sample size (n -> infinity),
How big does n need to be?
It depends on the pop distribution, but a guideline is n > 10
BUT, if the pop comes from an extremely skewed distribution, n may need to be larger
How do we interpret a confidence interval?
- For 95% CI:
- In repeated random sample from the pop, 95% of the confidence intervals computed from the samples will contain the true value of the pop parameter (µ for example)
- Difficulty is we cannot PROVE the truth of any H
- If our data are NOT consistent w/ H0 being true, we reject it in favor of H1(although that doesn’t mean H1 is true either)
- If the data are consistent w/ H0, we cannot reject H0, which is not the same thing as accepting H0 (test stat is closer to 0)
- Example
- Some Hypotheses
o H0: µ = µ0 vs H1: µ = µ1
o H0: µ ≤ µ0 vs H1: µ > µ0 (one sample)
o H0: µ = µ0 vs H1: µ ≠ µ0 (two sample)
- Definitions
o TI error: occurs when H0 is rejected when in fact it is true
α = Probability of TI error, aka the significance lv of the test
typically: α = 0.05
o TII error: occurs when H0 is not rejected when in fact it is false and should be rejected
ß = TII error
Typically: ß = 0.1 or 0.2
o 1 – ß: Power of the test = the prob of rejection H0 when it is false
o We like small α and large power
If you want to reduce both types of error, we need to increase the sample size A LOT