march 17 Flashcards
How do you know something is statistically significant →
the p value if its less than 0.05 its significant
Differences between groups
non parametric Chi square test of difference: does the proportion in one group differ from the proportion in another group, nominal
Independent samples t test: does the mean of one group differ form the mean ofo another group
Analysis of variance (anova): are there differences in the means of 3+ groups?
differences within a group
1) one sample t test: does the nean difere from a benchmark (interval and ratio)
2)Paired samples t test: does the mean for one variable differ from the mean for another variable (interval and ratio)
One sample t test
Does the mean differ from a benchmark
Ex: do people still make calls on their cell phones
The benchmark is zero calls
Ho: the average person uses 0 minutes of air time
Ha: the average person uses more than 0 minutes of air time
P is less than 0.001 so its statistically significant so we can reject the null hypothesis and accept the alternative hyp
The first thing we should look at is the mean, look at peoples air time minutes (the mean)→ 48 mins compared to 0 mins is significant, focus on p value, look at t, df and sig
The quiz will have a question, come up with a null and altnertaive hypothesis, figure out the mean and significance . when you end up interpreting, Say people spend 48 mins vs 0 and the p value is this amount so its significant
If we werent segmenting, then it would be a paired sample t test
the benchmark is the population mean
Paired sample t-test
Looking at if the mean for one variable differs from the mean for another variable
Remember what you are exactly comparing (two variables and your overall sample→ looking at 2 vairables not 2 groups, or two measurements)
Ex: do people send or receive more texts
Ho: the number of texts sent does not differ from the number of texts received
Ha: the number of texts sent differs from the number of texts received
Firs thing you should look at is the mean, (text received vs sent), want to see difference between the means, go to the table at the bottom and look at p value (look at word sig)
We want to figure out if the difference between means is significant
Given a p value of 0.626, you fail to reject the null hyp because its not less than 0.5, there is a 63% likelihood you observe these results due to chance→ dont want a high number for this (cause then it means its due to chance)
(dont need to know how many tails the test has)
Always two tailed
each participant has both a “sent” and “received” value, making the data paired
Independent samples t test
Does the mean of one group differ from the mean of another group
Ex: Do people who own a landline spend less time on the phone
0= dont have a landline, 1= do have a landline
Ho: people with a landline do not use fewer mins than people without a landline
Ha: people with a landline use fewer mins than people without a landline
We see theres a difference in means but have to see if its significant so go to p value and go to sig (2 tailed). We have two p values look at 0.001
two separate groups; people who have landlines and people who dont
Anova
Are there differences in the means of 3 + people
Like t tests just with more than 2 groups
Ex; does monthly bill differ by carrier?
Asking a broad question
Always look at the mean→ and then look at whether there are differences overall ( bill per month)
Ho: people using different carriers do not have different bills
Ha: people using different carriers have different bills
Format: f (degress of freedom bw groups, degrees of freedom within groups)
There is a sig difference bs p< .001
Non parametric chi square test of difference
Does the proportion in one group differ frm the proportion in another group
Looks at proportions (nominal variables)
Ho: markert share is not different between carriers
Ha: market share is different between carriers
Popularity is a categorical variable, so you are looking at a nominal variable
You look at the asymp sig value → = .004, so you reject your null hyp there is a diff between carriers
Categorical variables
testing for association
Relation tests
Looking at associations between variables
Are two things related/ associated→ thats your sign to do a relation test but what type of test
Ex; are these two variables related or associated? Look at linear regression, correlation or chi squared independent test
Am i talking about causation or correlation (will be using language that shows causation → will say does this predict that so always choose regression, if he says associated or correlated its either chi square or pearson test, is the variable a metric or quantitative variable or nominal variable→ nominal is chi and quantitative is pearson)
Ex: are gender and carrier related
Ho: there is no relationship between gender and carrier
Ha: there is a relationship between gender and carrier
Look if both variables are nominal→ chi squared
Cross tabulation bc looking at gender and carrier
Reject null → look at asymptotic significance. Look at only the first row. Thus, gender and carrier are signifcantly related
NOMINAL IS CHI SQUARED
QUNATATTIVE IS CORRELATION (associaton)
predict–> regression
Correlations
Pearson correlation coefficient r
Correlation is not causation
Correlations can be positive or negative
Ca have no correlation → 0 up to o.1
0.1- 0.3= weakly correlated
0.3-0.7= moderately correlated
0.7- 1= strongly correlated
Absolute values of all of the variables
Pearson correlation → is there a relationship between 2 metric variables
Do people who send more texts receive more texts?
Look at 2 quantitative variables and the correlation,
Ho; there is not a positive correlation between texts received and text sent
Ha: there is a positive correlation between texts received and texts sent
They give you the sign (if its positive or negative, but they give absolute value for pearson correlation), need to look a pearson correlation (not regular significance), in this case its strongly correlated
Regression
Do one or more variables cause a change in another variable
Anytime you see cause or predict= regression
Has to be a quantitative variable
He will say cause or predict and that means its alwyas regression in the quiz (needs to be quantitative)
Look at the regression in the anova
Ignore constant and residual
Types of Errors:
Type 1 Error (False Positive): Incorrectly rejecting the null hypothesis.
Type 2 Error (False Negative): Incorrectly failing to reject the null hypothesis.
correlation
look at chi squared and Pearson correlation
chi squared: relationship between 2 nominal variables
Pearson: v=Is there a relationship between 2 metric variables?
causation
look at simple linear regression
chi square of independence
Is there a relationship between 2 nominal variables
ex: Are gender and carrier related