exam 2 Flashcards
Problems with z scores
Z test procedure– use one sample mean from a known population to test hypotheses about an unknown population
To use a z score we need to know the pop mean and std deviation from which we draw our sample
We often have a solid idea of what the populations mean should be
We have to know population std deviation, but we usually don’t
Solution to this it the t statistic
T statistic
test hypothesis about an unknown population when the value of the population std deviation is unknown
We estimate it with our data
Formula is very similar to the z score formula
Main difference– use and estimated standard error in the denominator
Estimated standard error
estimate of the real standard error when the population std deviation is unknown
We don’t have info about population, but we do have info about the sample so we use that in place of the population
estimated standard error goal
prove an estimate of the standard difference between a sample mean and the population mean
Estimated slightly differently across three types of t tests
All have slightly diff estimation, but they all follow the same formula
One sample t-test
comparing the mean of one sample with a known population mean
Same logic as a z test
We still don’t have any control group– its one sample and one group and wanting to see if after applying treatment, the group still belongs to the known population
Diff between z-test– estimating the population std deviation/variance
Null and alternative hypothesis remain the same
one sample t-test examples
Are the starting salaries of CSU grads different from the national average (63,795)
We know the mean– 63,795 and want to see if CSU grads match that
Did our participants score differently from the median scale point of 25
Independent samples t-test
compare mean of one group with the mean of a different group
Most typical one used in psych
Allows researchers to evaluate the mean difference between two population using data from two separate samples
Want to see if the two samples you are comparing belong to two different populations
Looking at the difference between means
independent samples t-test null, alt, and cohens d
Null hypothesis– two samples come from the same underlying population– there’s no real differences between them
Assuming that the difference is zero
Alternative hypothesis– two samples come from different populations
Cohen’s d
Taking sample mean difference over pooled sample std. Deviation
Assumptions of independent samples t-test
The two populations are normally distributed
Can assure this by a big sample size (at least 30)
Two samples are random samples from their populations
Homogeneity of variance
States the two populations you are estimating have the same variance (spread around mean)
Necessary to justify pooling
Solution– make sample sizes equal
There is never a case where you can safely violate the homogeneity of variance assumption
False
violated when you have even sample sizes and doing the welch two samples test
independent samples t test example
do men and women have different emotional responses to romantic comedy movies?
Paired samples t test
compare mean of one group with the mean of a group that is matched or connected to the first in some way
Ex– couples because they are related to each other in some way
Could be over time, or just genuinely related to each other
paired samples t test repeated measures
participants measured in two conditions (or two time points)
Dataset consists of two scores on one variable per individual
All participants should be scored twice on the same variable to see if there’s some improvement
Like a Quasi experiment
paired samples t test matched pairs
different participants in each condition, but they are matched somehow
Dataset consists of two related groups who are scored on the same variable
Could be related or a case like people that have the same IQ
Ex– is there a difference in eating behavior between mothers and daughters
paired samples t test logic
Everyone in the population is tested at baseline
Everyone in the population is then treated
Everyone in the population is tested after treatment
We want to assess whether is a systematic change in the same population between the first and second measures
paired samples null and alt
Null hypothesis– the population of difference score has a mean of zero
Alternative hypothesis– the population of difference scores does not have a mean of zero
Degrees of freedom
Problem with t-test– we are estimating the populations standard error with our samples standard error
We don’t know how close our estimate is to std deviation
To compute the sample variance, we use the sample mean
This restricts the amount of variance we can have
Degrees of freedom is the number of scores in a sample that are statistically independent and free to vary
Statistically free to vary = values not restricted by sample mean
essentially just sample size -1
The higher your degrees of freedom, the morse accurate your estimates standard error, and thus, t-statistic, will be
Higher sample sizes (degrees of freedom) will be more accurate in terms of how they represent the population
The first numbers can be literally anything, but once we know the mean, the last number is fixed
Important
T distribution
sampling distribution in a t test is still a distribution of all possible sample means
t distrobution Comparison between z-test and t-test
A score distribution is the set of z scored for all possible means of a particular sample size (n)
A t score distribution is the set of t-scores for all possible sample means of a particular degrees of freedom (n-1)
Family of distributions of all possible degrees of freedom
Pick the one you want to use based on samples degrees of freedom
Key– we can use the t-distribution in the same way we used the z distribution if we know the degrees of freedom
T-distribution shape
Shape differs across possible degrees of freedom
Different sample sizes = different degrees of freedom = different t-distributions
Generally looks pretty normal if sample size is 30
Greater degrees of freedom, the more normal it will be
Probability and the t distribution
We can determine probabilities of extreme scores in t distributions we did for z distributions
Use a t distribution table
Use R studio
T scores of greater absolute value are more extreme, and therefore more rare
One sample t-test assumptions
Observations in our sample are independent
Score of one observation has no bearing on another observation
If they are dependent on one another, can use paired samples t-test
Comparing to a known value
Ex– would be inappropriate if a sample was a mother and daughter
Population sample must be normal
Often violated– rare to see a truly normal population distribution
T-tests are robots to this violation when sample sizes are at least n=30
One-sample t-test hypothesis testing framework
Step one– state null and alternative hypothesis
Step two– set your alpha level and find your critical region/value
Step three- calculate test statistic
Step four– compare t to the critical t and make a decision
practical significance
is this effect big enough to matter?
Even a small effect might be practically important in the right context
Statistical significance
(rejecting the null hypothesis) is not the same as practical significance
Huge sample means higher power to reject the null when the effect is very small
Effect size
measurement of the absolute magnitude of a treatment effect, independent of the sample of the samples being used
Sample size doesn’t matter, just focusing on how big the effect was
For t tests we can use cohen’s d
Effect size measure we use to compare two means (t-test)
Measures weather or not the mean difference matters
Interpreting effect sizes
Cohen suggest guidelines for interpreting cohen’s d
Negligible– 0-.19
Small– .20-.49
Medium–.50-.79
Large– .80
Not the best because it all depends on context
DFs effect on power
When you select a score as a cutoff point between the body and tail, it will be more extreme in a normal distribution than a t-distribution
Interpretation of standard error of mean differences
independent samples t test
Measure of average distance between a sample statistic and the corresponding population parameter in the sampling distribution
Uneven samples
(interpretation of std error of mean differences)
when sample are diff sizes, the larger sample provides a better estimate of the variance than the smaller sample
Solutions– pooled variance
Use pooled variance to calculate the standard error
If the samples are equal, both the pooled and unpooled formulas will derive the same result
If they are not equal, the pooled will be better
Independent sampling distributions
distribution of differences between the means
Diff between the mean is different from a different score
Ex– mean difference between two different samples
Control sample mean - experiment sample mean
Determines probability of mean difference scores
paired sampling distributions
distribution of means of difference scores
Ex– time 1 score - time 2 score
Assumptions for a paired samples t test– Observations within each time point must be independent
All of the husbands can’t have a systematic relationship to each other, same for wives
Time point example– observations can not be related to each other
assumption for a paired samples t test– Population distribution of difference scores must be normal
Assuming its coming from a normally distributed population
Ways to get around it/violate
Sample is 30 or higher
Advantages or paired samples over independent samples
Paired samples t-test are more powerful
More likely to reject null if effect exists
Need less people
Independent– need at least 60 for independent, 30 for paired (when looking at it longitudinally- time 1 and time 2)
More flexible
Can look at change over time and matched pairs
Can never assess change over time with independent sample
More powerful– individual differences are controlled for
Using the same people/matched pairs = less “noise” in data
Adjust the standard error calculation to account for this
Independent– two separate groups with two separate standard error
Based on raw score
Paired samples– standard error is based on difference scores from the same people
If you can, always choose paired over independent samples
Measuring effect size in paired samples t test
Effect size for the paired-samples t is measured in the same way that we measured effect size for the one-sample t
Can use cohen’s d
Hypothesis test for the paired-samples t test
Step one– State null and alternative hypothesis
Step two– Identify cutoff regions
.05
Step three- Compute the t statistic
In this case, its paired
Step four– Make decisions on the null, compute the p value, and interpret the result
One-way Anova
Comparing 3 or more means
We want to determine whether the sample mean difference are so large, that there is a true population mean differences
One-way vs. t test
One-way anova is more flexible
Can be used in situations where there are two or more means being compared
T-test are limited to situations where only exactly two means are being compared
Main differentiator
One-way is accommodating more than two groups
However, when there’s only two, normally just use independent t test
Anova terms
In anova, each independent variable is called a factor
Factor is generally a nominal or ordinal variable
Categories, ranked or unranked
Not dealing with continuous IV
Each factor has at least two levels (categories)
We divide observations into groups based on their level of the factor
Two one-way anova hypothesis
Null is still that there is no real difference between groups in the population
For alternative, not all means are equal to another
At least one different mean in the population
Experiment wise type one error
If we have three groups why not just run three t tests?
Would indeed to run 3 t tests to examine differences between all groups
Want to keep experimentwise alpha level around .05
One way anova allows us to evaluation all the mean differences in a single hypothesis using a single alpha level
Keeps the risk of type one error under control no matter how many different means are being compared
Logic of one-way anova
Goal– estimate true mean differences in the population
Problem– we don’t have two groups anymore
Cannot just use the mean differences– there are now at least three to consider
Solutions– use the variance between the groups insead
Compare this systematic between-group variance to variance that is randomly occurring within the groups
Process to calculate one-way anova
First calculate the total variability for the entire dataset
Separate the total variability for the entire set of data into two basic components
Within treatment variability– error
Between treatment variability– treatment effect
We want to compare the amount of between-treatment variability to the amount of within treatment variability
Within group variance– one way anova
Size of differences in the scores you are seeing inside of each of the three groups
If all within group variances differ, sum up within group variance across all groups in anova
Random sampling error
Because individuals receive same treatment or are in same group, you did not cause individual differences, they just happened to occur
This is a measurement of the random sampling error in our data
Between group variance– one way anova
how individual differences are accounted for by the fact they receive three diff treatments or are coming from three diff groups
Measures the size of the differences between the three groups means
Want them to be large
Showed that different treatments made a difference
Variance across all group means is the between group variability
Considered a measure of the treatment effect and random sampling error
Between-treatments variance gives us information about the size of the group differences
Sources between-treatments variability
Logically the mean differences can b e cause by two sources–
Treatment effects
If treatments have different effects, could cause the mean outcomes score for one treatment to differ from the mean outcome score for another treatment
Sampling error
Even if there is no treatment effect, you would still expect some differences between samples
F statistic
test statistics for one-way anova
Technically a ratio
We divide the between treatment variance by the within treatment variance
Bigger f ratio, more likely to reject the null hypothesis
Large value indicates treatment effect is large
Treatment effect + sampling error divided by sampling error
When null is true and there’s no treatment effect, the f-ratio is balanced
Treatment effect is 0, so it’s basically just sampling error over sampling error
Comes out to a f-ratio of 1
F-ratio is always positive
Large treatment effect– treatment effect in numerator will not be equal to 0 (could be 1, 2, 3, etc but the higher the better)
F-ratio will be larger than 1
Want to see ratio larger than 1 so we can assume treatment had an effect
Is it high enough to be rare?
F-ratio has the same structure as the independent samples t statistic
Sampling distribution for anova
distribution of f-ratios
F-ratio– between group variance / within group variance
Within– error
Between– error + treatment
sampling distribution for anova consequences
Sampling distribution will pile up around 1 if null is true
“Family” of distributions, not just one
Shape depends on both degrees of freedom
The higher the total df, the more closely the possible f-rations under the null will pile around 1
Will look skinnier
Anova assumptions
Observations are independent of one another
Homogeneity of variance– populations from which samples came from all have the same variance
Usually violated, but if you have even sample sizes it won’t matter
Populations distributions are normal
If you have large sample sizes (more than 30) it won’t matter
Measuring effect size for anova
Can’t use cohen’s d anymore
Can’t take a difference between three groups because were dealing with variance
We use eta squared (looks like n squared)
What % of total variability is accounted for by treatment variability
.01- .05– small
.06-.13– medium
.14 or higher is large
Anova hypothesis testing steps
- State null and alternative
- Set alpha and locate the critical f statistic value (.05)
- Compute the sample f statistic
- Make a decision/ analyze
One way anova and post-hoc tests
Technically called an omnibus test
The null hypothesis tests the equality of several means at the same time
Keeps type 1 error under control
Pairwise comparisons
To fully understand our anova results, we need to follow up a significant anova by comparing all possible pairs of means against each other
Running all possible pairs and comparing them
Controlling for type one error by only doing it once
Post hoc test
Running them in response to finding statistically significant results
When we test all possible pairs of means for differences, we call them post hoc
Post hoc– “after the fact”
These are differences we didn’t predict/hypothesize
Tukey’s honestly significant difference
Uses the information from the omnibus test directly to set the standard
Asking how big the difference in means needs to be in order to be honestly significant
Drawbacks
Only works when you have even sample sizes
Fairly liberal– does not control type 1 error rate as well as other options
Still better than running 3 tests, but not usually the post-hoc test you would choose
Sheffe test
Calculates a shepherd MSbetween (between treatment variance) for every pair of groups
The f between two groups must exceed our critical f value for the omnibus test to be significant
Mean difference has to be larger than the omnibus test result ever was
Makes it more conservative(having a strict cut-off value)– what you want for a post-hoc test
Called anova because we use variance instead of mean difference and because we break variance down into different parts
sheffe test benefits
Works in situations with uneven groups sizes
More conservative– requires larger group differences for significant
Does the best in terms of controlling type 1 error
Factorial anova
Goal– further breaking down the variance
Two main types
Main effects– always at least 2
Always have at least two factors (independent variables)
Interaction effects– always one that you can probe with post-hoc comparisons
factorial anova key cont
All types of mean differences are statistically independent of one another
If you have a significant interaction effect, there’s no guarantee their will be a significant main effect
Vice versa
How two independent variables (factors) depend on each other to influence the outcome/dependent variable
Are they constant or change depending on one another?
Theoretical framing
expect effects of one factor to depend on the effects of the other factor
Key aspects of a question that is suitable for a factorial anova
Two or more factors that are categorical
One dependent variable that is numeric
how does the presence of snow (no snow vs. snow) and age group (children vs adults) affect levels of Christmas spirit during the holidays?
The structure of a two factor experiment is usually presented as a matrix
Each factor needs to be categorical
Nominal, sometimes ordinal variable
Makes four diff groups
Creates 3 means
2 x 2 factorial design
Main effects
Mean differences between levels of each factor
Always have one main effect per factor
2 x 2 design– 2 main effects are of interest
Our example– skill level main effect and audience main effect
Interaction effects
Typically more interested in this
Does the effect of one variable depend on the effect of the other variable
You always only have one interaction effect
Our example– does the effect of skill level depend on whether an audience is present
Depend should always be in the sentence
Rules for interpreting two factor anovas
Use either
Matric of cell means
Plot of interaction
Should be able to get main of interaction effect form either
When there is no interaction, interpret main effects
When there is an interaction, typically avoid interpreting the main effect
Misleading because their effects depend on the interaction
Testing main effect of factor a
Looking for differences between means in the margins
Average difference of single factor
Going across the columns
Testing factor b
Same process but want to see the avg effects of level one of factor b differing from the avg effect of level two of factor b
Going down the columns
Still marginal means
Testing interaction effects
Test differences among the cell means
Key is relationships– do relationships differ from one another
Can look at relationship going down or across columns– both equally as valid
Inside the matrix– no longer looking at marginal means
How to interpret interactions
Do the lines cross
If they do, it’s a disordinal interaction
If you don’t, its an ordinal interaction
Are the lines separated
how to interpret interactions
If they are, its consistent with a main effect of IV #2 (the variable that defines the lines)
does the line slope
how to interpret interactions
If the lines generally slope up or down, its consistent with main effect of IV #1 (variable on the horizontal axis)
3 null
2x2 factorial anova
Main effect of factor a– mu a1 = mu a2
Main effect of factor b– mu bl = mu b2
Interaction effect– there is no interaction effect between factors a and b
3 altunerative
2x2 factorial anova
Main effect of factor a– mu a1 does not = mu a2
Main effect of factor b– mu bl does not = mu b2
Interaction effect– there is an interaction effect between factors a and b
F ratio in the two factor anova
Each of the three hypothesis tests in a two factor anova will have its own f ration
Main difference– you have three f ratios now, using the between person variance for factor a, factor b, and the interaction
Effect size in factorial anova
Parietal eta squared– tells us the amount of variance in the dependent variable that can be accounted for by the effect of interest
Compute three separate partial eta squared
Factor a main effect
Factor b main effect
Interaction effect
Uses between treatment sum of squares for numerator as well as within treatments sum of squares for denominator
Sampling distribution for a one sample t test
distribution of means
Sampling distribution for Independent samples
distribution of differences between the means
Sampling distribution for Paired samples
distribution of means of difference scores between the pairs
Sampling distribution of One way and factorial anova
distribution of f ratios
What does an f ratio of 1.00 indicate
no systematic difference
Reject the null hypothesis