Khan Academy: Two Sample Infference Flashcards
How can we find the confidence interval and p-value for two sample difference inference when testing the proportion? ( The variable we’re interested in, is the difference between a statistic from two different samples)
We have two different samples, each have different proportions for example proportion of men voting for a candidate and the proportion of women voting for the same candidate (P1 and P2)
We are interested in knowing if there’s a significant difference in the proportion of voting men and voting women (P1-P2)
The new variable we are interested in, which we call P, is P1-P2
The variance for P= VarP1 + VarP2
We can find the confidence interval using P and VarP.
To find P-Value, the sampling proportion will be 0. And since we are assuming Ho is true and there’s no difference in the proportions of the two samples, we use mean(P1,P2) as each sample’s proportion, so VarP= Var sample1+Var sample 2 (Note that now the Vars are calculated using P1=P2=mean(P1,P2))
Now we have the sampling distribution of proportion’s mean and variance, so we can calculate the P-value for P1-P2.
How can we design an experiment to test if the obtained results of our experiment were based on random chance ( if the probability of them happening by chance was high or low)?
We have two groups (treatment and control), we have measurements for each group, we in this course focus on the mean difference.
To test the probability of the obtained mean difference by chance:
So
Ho: mean difference obtained is by chance
Ha: mean difference obtained is not by chance
1) we randomly choose events/entries from both of the groups and create treatment and control groups many times (1000 times for example)
2) we calculate the mean difference between the two groups
3) we create a table of frequency ( we round up to certain numbers so we have limited number of mean differences) so each mean difference has a frequency
4) we calculate the probability of obtaining the certain mean difference (and higher mean differences) we obtained from the original test
5) If it’s a really small probability (%5 for example) then the probability of obtaining the result we did, by random chance is 5%, it’s so extremely low that it’s probably not by chance.
How can we test the hypothesis for difference of means?
1) we know that for variables X,Y, we’ll have:
Mean(X-Y)= mean X - mean Y
And for the Var(X-Y)= Var X + Var Y
2) when we have two samples, and we want to do hypothesis testing for their means, we’ll use sampling distribution of mean, based on our samples, the mean of each sample’s sampling distribution of mean equals each sample mean
3) variance of each sampling distribution of mean equals each sample variance/ n n= sample size
4) Ho: there’s no difference in the means
Ha: there’s a difference between the means
To test the Ho, mean of sampling distribution of mean equals 0, and its variance=Var sampling distribution of mean for sample 1 + Var sampling distribution of mean for sample 2
Note: we are assuming that the sample inference conditions are met and the samples’ vars are a good approximation of the population var
5) If the difference of the means value and more extreme values’ probability of happening is below the significance level, then we can safely conclude that since the probability of such value happening is so low and yet we still obtained it, we can reject Ho
Ref
What’s the difference between a paired t test and a two sample t test?
Two-sample t-test is used when the data of two samples are statistically independent, while the paired t-test is used when data is in the form of matched pairs.