Week 11 Flashcards
What are four inferential statistical tests?
- Correlation
- Simple Regression
- Chi-square
- t-tests
What does correlation measure?
Are two variables related
What does simple regression predict?
Predicts one (continuous) variable from another (continuous) variable
What are two types of chi-square?
- goodness of fit (on categorical variable)
2. test of independence (two categorical variables)
What are the three types of t-tests
- single sample (one continuous variable)
- independent groups (two levels of one continuous variable)
- between subjects design - paired-samples (two levels of one continuous variable)
- within subjects design
What does a confidence intervals show?
The range of plausible values for the parameter
What is shown in a between groups confidence interval?
If the CIs overlap <25% difference, then it is statistically significant
What is shown in a within-groups confidence interval?
If CIs do not overlap then the difference is statistically significant
If CIs overlap, statistical significance cannot be inferred
What does a confidence interval show in relation to effect size?
The magnitude of effect in standardised units
-emphasis on the data, not statistical significance
When are t-tests used?
When we are interested in whether the means of two populations differ
What is done if there are multiple means to compare?
-could choose multiple t-tests
What is a problem with conducting multiple t-tests (2)?
- having to conduct many tests
- each test has a .05 probability of a type 1 error, so making many tests increases type 1 error rate
What is analysis of variance (ANOVA) used for?
Testing hypotheses involving multiple means in a single test
What is the simplest ANOVA test of variance?
One-way Anova
-One IV (factor) with two or more levels
What type of test is a one-way ANOVA?
It is an omnibus test.
What does the null hypothesis tell us in a one way anova?
That all means are equal.
In a one-way ANOVA, what does rejecting the null hypothesis mean?
Deciding that some means differ. A statistically significant result is ambiguous.
What is total variance in terms of a one-way ANOVA?
Variance of all scores around the grand mean.
What does between-groups variance mean in a one way ANOVA?
Variance is due to the independent variable.
What does a variance in a within-groups (error) design mean?
Variance that is unaccounted for.
What are the three types of variance that ANOVA partitions into?
- total variance
- between groups
- within groups (error)
How do you total the whole variance of data in an ANOVA?
total=between groups + within groups
Who is the statistic from ANOVA “F” named after?
Sir Ronald Fisher
In a one way ANOVA, what assumption is made about independence?
For a between-group Anova, each participant only contributes one score.
In a one way ANOVA, what assumption is made about normality?
Data within groups should be normally distributed.
In a one way ANOVA, what assumption is made about homogeneity of variance?
- variance of scores within groups should be equal for all groups
- tested using Levene’s test
In a one-way ANOVA, what assumption is made about outliers?
There should be no outliers in any group’s scores
If a homogeneity of variance is intact using Levene’s test, what will the p value be?
p>.05
If the homogeneity of variance is violated using Levene’s test, what will the p value be?
p
What can be used to assess the homogeneity of variance?
Levene’s test.
What is the homogeneity of variance decision based upon?
The p value.
In a homogeneity of variance, if p > .05, what do we infer?
We do not reject the assumption that variances are equal. Therefore we can conclude that the homogeneity of variance assumption is intact.
What can also be used to ‘enhance’ Levene’s test?
Hartley’s F max.
What are the two steps to find Hartley’s F max?
- Find group with largest variance (SD^2) and the group with the smallest variance
- Divide largest variance by smallest variance
What is calculated in ANOVA tests?
Involves calculating sums of squares (SS)
How do we calculate an ANOVA?
It is the sum of squared deviations of scores about their mean.
What yields a mean square?
By dividing a sum of squares by its degrees of freedom.
What is sample variance?
A mean square.
What is the square root of variance in an ANOVA test?
The standard deviation
What does SStotal mean?
The sum of squares of all scores from the grand mean
What does SSBetween mean?
The sum of squares of group means from grand mean. Reflects variance due to IV (treatment)
What does SSWithin mean?
The sum of squares of individual scores about their group mean. Reflects variance due to error (unaccounted for variance).
How is the SStotal calculated?
SStotal=SSBetween+SSWithin
With within-groups variance, how can some of the total variance be accounted for using Hartley’s F?
By differences among subjects
-random variance
With between groups variance, how can some of the total variance be accounted for using Hartley’s F?
By effects of conditions
-variance due to condition
Using Hartley’s F, if the variance that is associated with the IV is greater than the random variance, what will F exceed?
F will exceed 1.0.
What are two things to remember when accounting for variability?
- random factors will create variability among subjects AND variability among group means
- Reject null when between-groups variance is so large relative to within-group variance that it would rarely occur if only purely random processes were working
What does k stand for in an ANOVA test?
the number of groups
What does n stand for in an ANOVA test?
the number of participants per group (assuming equal group size)
What does N stand for in an ANOVA test?
the total number of participants
A statistically significant F-test involving three or more means is what?
Ambiguous: which means differ? Follow up tests are pairwise, which is a series of t-tests between pair of means in the set
What is a problem with multiple comparisons of an F-test?
Increases the chance of making at least one type 1 error
-family wise error rate
What is a type 1 error?
Assuming that the null is actually true.
What does a conservative multiple comparisons test do?
They maximally control Type 1 errors (false positives)
What are the benefits and drawbacks of doing a conservative multiple comparison test?
benefit: risk of making at least one Type 1 error is no greater than 5%
drawback: minimising type 1 errors increases the risk of Type 2 errors (false negatives)
What does a liberal test do?
Reduces the risk of Type 1 errors, but not as much as conservative tests
What are the benefits and drawbacks of a liberal test?
Benefit: Greater statistical power to detect a true effect
Drawback: Risk of a false positive is higher than for conservative tests
What is the most conservative of all multiple comparison tests?
The Bonferroni test. It is simple and effective.
How does a Bonferroni test adjust p-values required for statistical significance?
By dividing desired alpha level by the number of tests performed
How would a bonferroni test do a multiple comparison test with four IV levels?
- Six pairwise comparisons to test all possible differences
2. Bonferroni-adjusted so that p = .05/6 = .008.
In a Bonferroni test, when are p-values declared significant?
When p-values are less than adjusted p values.
When is a Bonferroni test best used and why?
Best used when the number of comparisons is small, because adjustment is severe when many comparisons are required.
What are four characteristics of Tukey’s Honestly Significant Difference (HSD) test?
- Moderately conservative and very popular
- Required p value calculated using the studentised range statistic
- Test is performed by determining difference between means required for significance
- Great choice when number of required comparisons is relatively large
What is the most liberal test?
“Fisher’s protected least significant difference test”
Is there an adjustment to the p-value in a “Fisher’s protected least significant difference test?”
No.
how does a “fisher’s protected least significant difference test” control the Type 1 error rate?
By requiring statistically significant omnibus test (ANOVA) before interpreting results
Which two tests do not require ANOVA to be significant?
Bonferroni and Tukey.
When should “Fisher’s protected least significant difference test” be used?
When ANOVA is significant and there are only three means.
Where can we use “Fisher’s protected least significant difference test?”
In jamovi
-no correction option for post-hoc tests
What are two examples of when repeated-measure ANOVA can be used?
- When IV is a within-subjects factor
2. extension of paired-samples t-test for three or more conditions
What are two assumptions about Independence when using a repeated-measures ANOVA?
- Scores for different participants must be independent
2. BUT scores within a participant will NOT be independent
What are two assumptions about normality when using a repeated measures ANOVA?
data within groups should be normally distributed
What are two assumptions about sphericity when using a repeated measures ANOVA?
- variance of scores within groups should be equal for all groups
- tested using Levene’s test or (and) Fmax < 4.0 : largest variance/smallest variance
What is the assumption about outliers when using a repeated measures ANOVA?
No outliers in any group’s scores.
Repeated measures ANOVA has an assumption of sphericity, NOT:
homogeneity of variance
What is sphericity?
Difference scores can be calculated between adjacent conditions. Sphericity requires equal variance for sets of difference scores.
What are some steps to use repeated measures ANOVVA in jamovi?
- setup factors and levels
- move variables into repeated measures cells
- select options (e.g to test sphericity, post-hoc correction)
- results will show effect size, p value, df and whether the sphericity assumption has been violated
In critically reviewing a repeated measures design, what needs to be assessed in terms of practise effects?
- are different word lists used for the three conditions?
- if different lists have been used are they equivalent in terms of difficulty?
In critically reviewing a repeated measures design, what needs to be assessed in terms of practise effects?
how was the potential for sequence effects controlled?