Statistics Flashcards
Definition
one type of inferential statistics. It is used to determine whether there is a significant difference between the means of two groups. With all inferential statistics, we assume the dependent variable fits a normal distribution
T test
What is the internal and external validity rated for an quasi-experimental study?
Internal: medium
External: medium
What is a two-way ANOVA?
An ANOVA with 2 factors (IVs)
What is contrasts?
The between treatment variability is explained by the experimental manipulation – i.e., is due to participants being assigned to different groups
What do you do if homogeneity is violated in an ANOVA?
- If sample sizes are large and equal…
- ANOVA can handle normality violation
- If sample sizes are small or not equal…
- Use the Brown-Forsythe or Welch F ratio (and their associated p and df; Welch is more powerful) instead of the regular F ratio
Define
T test
one type of inferential statistics. It is used to determine whether there is a significant difference between the means of two groups. With all inferential statistics, we assume the dependent variable fits a normal distribution
What are some threats to external validity?
- Generalising across participants or subjects
- Generalising across features of a study
- Generalising across features of the measures
Definition
the validity of applying the conclusions of a scientific study outside the context of that study. Tt is the extent to which the results of a study can be generalized to and across other situations, people, stimuli, and times
External validity
Definition
F = variance between sample means / variance expected by chance
F statistic
t-Tests are “difference tests”. Used to compare mean differences between up to ____groups or conditions
t-Tests are “difference tests”. Used to compare mean differences between up to two groups or conditions
Define
Type I error
the rejection of a true null hypothesis (also known as a “false positive” finding or conclusion)
Definition
a collection of statistical models and their associated estimation procedures (such as the “variation” among and between groups) used to analyze the differences among group means in a sample
ANOVA
What do you use in a post hoc test is there are unequal variances?
Games-Howell
Define
Confounding variable
factors other than the independent variable that may cause a result
What does witihin treatment variance look like on this graph?


What is the experiment-wise error rate of an analysis that uses 3 comparisons at an alpha level or .05?
αEW = 1 - (1 - αTW)c
αEW = 1 - (1 - .05)3
αEW = 1 - .953
αEW = .14
14% chance of comitting at least 1 type I error
What test is used to assess normality?
Shapiro-Wilk
Definition
An experimental design that that looks a bit like an experimental design but lacks the key ingredient – random assignment
Quasi-experiment design
What is the internal and external validity rated for an experimental study?
Internal: high
Externl: low
Definition
A type of reseach used to assess changes over an extended period of time
Developmental research
How do you minimise participant attrition?
- Increase sample size and measure/ compare participants who do/don’t withdraw
What does a positive t-value tell you?
That the mean for condition/sample 1 is higher than the mean for condition/sample 2
What is the non-parametric alternative to a one-way repeated-measures ANOVA?
Friedman’s test
How do you minimise environmental variables?
- Standard experimental procedures, setting, and experimenter
What are threats to both internal and external validity?
- Experimenter bias
- Demand characteristics and participant reactivity
How do you calculate the F ratio using sum of squares and degrees of freedom?
Mean squared deviation (MS) = Sum of Squares (SS) / df
F = MSBETWEEN / MSWITHIN
What is the biggest threat to internal validity?
Confounding variables
Definition
factors other than the independent variable that may cause a result
Confounding variable
Why choose repeated-measures?
- A repeated-measures ANOVA uses a single sample, with the same set of individuals measured in all of the different treatment conditions
- Thus, one of the characteristics of a repeatedmeasures (aka within-subjects) design is that it eliminates variance caused by individual differences
- Individual differences are those participant characteristics that vary from one person to another and may influence the measurement that you obtain for each person
- e.g., age, gender, etc.
What is the formula used to determine the experiment-wise error rate?
αEW = 1 - (1 - αTW)c
Where c = number of comparisons
During a post hoc test, if assumptions are met and sample sizes are equal what do you use?
Turkey’s HSD
What are examples of non-experimental designs?
Observational, cross-sectional or longitudinal studies
In a one-way repeated-measure ANOVA, what is the F ratio made up of?
F = (treatment effect + other effect) / other effect
Numerator = between treatment variance
Denominator = within treatment variance
Individual differences not considers due to repeated measures
What is the internal and external validity rated for an non-experimental study?
Internal: low
External: high
Define
Experiment-wise error rate (αEW)
The probability of making at least on type I error amongst a series of comparisons
Define
ANOVA
a collection of statistical models and their associated estimation procedures (such as the “variation” among and between groups) used to analyze the differences among group means in a sample
What type of follow up test do you use if there is a specific hypothesis? What about when there is no hypothesis?
Specific hypothesis: Planned comparisons
No hypothesis: Post hoc tests
How do you minimise generalising across features of a study?
- Conduct naturalistic research
- Switch from between-subjects to a withinsubjects or matched-subjects design.
- Replicate study in different setting/with different experimenter
What does contrast 1 and contrast 2 test?


What effect size value is used for ANOVA? How do you calculate it?
Eta-squared (η2)
η2 = SSbetween / SStotal
How do you minimise generalising across features of the measures?
- Use multiple response measures (e.g., selfreport, observation, physiological).
- Systematically vary time of measurement as an IV and measure effect on DV and other IV.
What does a negative t-value tell you?
That the mean for condition/sample 1 is lower than the mean for condition/sample 2
What do you do if sphericity is violated?
If assumption is violated, apply a correction factor (of epsilon) to the degrees of freedom – This will in turn adjust the p value for the ANOVA
If Greenhouse-Geisser epsilon is less than .75 then use Greenhouse-Geisser
If Greenhouse-Geisser epsilon is greater than .75 then use Huynh-Feldt epsilon correction
What value of the MANOVA should you report in most situations?
Pillai’s trace
For the F ratio to be reliable and valid, what assumptions must be met?
- Independence of observations
- The observations within each sample must be independent
- Interval/ratio level of data
- Normality
- Poplations must be normally distributed as determined by Shapiro-Wilk
- Homogeneity of variance
How do you minimise time-related variables?
- Add control group for comparison purposes
- Switch from a within-subjects to a betweenor matched-subjects design
- Control/limit time between testing
- Counterbalance order of presentation of conditions across participants
What are the key elements of an experiment
- Manipulation of the independent variable (IV) to create two or more treatment conditions (levels).
- Measurement of a dependent variable (DV) to obtain a set of scores for each treatment condition (level)
- Comparison of the DV scores for each treatment condition (level)
- Control of all other (extraneous) variables to ensure that they do not confound the relationship between IV and DV.
- Random assignment of participants to each condition so that the groups can be considered truly equivalent.
Definition
The alpha level used for each comparison
Test-wise error rate (αTW)
Distributions like this would violate which assumption?

Homogeneity
Definition
the ratio of the between group variance to the within group variance
F-ratio
Define
Internal validity
the extent to which a piece of evidence supports a claim about cause and effect, within the context of a particular study
If there are 4 time points, how many orthogonal contrasts are there?
k = 4
orthogonal contrasts = k - 1
Therefore, there are 3 orthogonal contrasts
What are two reasons for between treatment variance?
- Treatment effects: the differences are caused by the treatment(s)
- Chance: the differences are simply due to chance
What Shapiro-Wilk result suggests normality assumption is met
p > .05, shape is not significantly different from normal, this normality assumption is met
How do you report one-way independent measures ANOVA test results?
- F* (dfbetween, dfwithin) = value, p = value
i. e. F(2, 12) = 23.49, p < .001
With an F-ratio of around 1 what does that suggest? Why?
With an F-ratio of around 1 we would conclude that there is no treatment effect

Definition
A type of ANOVA used to determine whether three or more group means are different where the participants are the same in each group
One-way repeated-measures ANOVA
True or False:
Repeated-measures designs are powerful
True
Repeated-measures designs are powerful because they remove individual differences
η2 = .059 is what size effect?
Medium
In terms of the F-ratio for a repeated measures design, the variance between treatments (the numerator) does/does not contain any individual differences
In terms of the F-ratio for a repeated measures design, the variance between treatments (the numerator) does not contain any individual differences
During Post hic tests, if sample sizes are slightly different then use _________ procedure because it has greatest power, but if sample sizes are very different use _________
During Post hic tests, if sample sizes are slightly different then use Gabriel’s procedure because it has greatest power, but if sample sizes are very different use Hochberg’s GT2
Definition
the rejection of a true null hypothesis (also known as a “false positive” finding or conclusion)
Type I error
Definition
A test used to determine whether there are any statistically significant differences between the means of two or more independent (unrelated) groups
One-way independent measures ANOVA
Define
External validity
the validity of applying the conclusions of a scientific study outside the context of that study. Tt is the extent to which the results of a study can be generalized to and across other situations, people, stimuli, and times
Why is an ANOVA preferred over t-tests for over 2 groups?
Each time you run a hypothesis test, you run the risk of commiting a type I error
Define
Developmental research
A type of reseach used to assess changes over an extended period of time
True or False:
ANOVA tests only non-directional hypotheses
True
What do you use to calculate the effect size for a post hoc test?
Cohen’s d
d = mean difference / SD
True or False:
You should not perform planned comparisons as well as post hoc tests for a one-way repeated-measures ANOVA
True
True or False:
This is the results of an ANOVA

False;
This table relates to MANOVA rather than ANOVA
Definition
any variables that you are not intentionally studying in your experiment or test
Extraneous variables
What do you do if normality is violated in an ANOVA?
- If sample sizes are large and equal…
- ANOVA can handle normality violation
- If sample sizes are small or not equal…
- Transform your data
- Run a Kruskal-Wallis test as the nonparametric alternative to a one-way independent-measures ANOVA
What extra assumption is required for repeated-measures? Why?
- Same participants in all conditions
- Therefore, scores across conditions will correlate
- Violates assumption of independence!
- Because of this, an additional assumption is required for repeated-measures ANOVA – namely, sphericity
- Put crudely, the assumption of sphericity means that the correlation between treatment levels should be the same
- Actually, it assumes that the variances of the differences between treatment levels are equal
Definition
a type of experimental design and is thought to be the most accurate type of experimental research that supports or refutes a hypothesis using statistical analysis.
True experimental research
How would the sample populations relate to each other if the null hypothesis was rejected in an ANOVA?
The sample populations would be all equal to each other and the same as the original population
What is the 4 step process of hypothesis testing using an ANOVA?
- State the hypotheses (H0 and H1)
- Decide when to reject H0
- Calculate the test statistic. In this case, the F ratio
- Make a decision about H0 (reject/don’t reject
How do you deal with individual differences in a repeated-measures ANOVA?
- The individual differences are automatically removed from the numerator because the design uses the same subjects in all treatments, but we must also remove them from the denominator
- Remove individual differences from the denominator by measuring the variance within treatments and then subtracting the individual differences
- The result is a measure of unsystematic error variance that does not include any individual differences
What are the threats to internal validity?
- Environmental variables
- Individual differences
- Time-related variables
- Participant attrition
- Communication between groups
ANOVA simply tests the null hypothesis that all group means are equal, and therefore a significant result merely tells you that at least one group’s mean is different from another.
How do you make more specific comparisons?
- Post-hoc tests
- No specific hypotheses at outset; Compare each group to each other but use a smaller α to limit type I error rate
- Planned comparisons
- Specific hypotheses at outset; make specific comparisons by breaking down the between treatment variance (total variance accounted for by model) into its component parts
What does a large F-ratio indicate?
The differences between treatments are greater than chance
Define
Extraneous variables
any variables that you are not intentionally studying in your experiment or test
For the following research question would you use an ANOVA or a t-Test?
Are sufferers of depression who receive any form of treatment (i.e., medication, exercise, or a combination of medication and exercise) less depressed than people who do not receive any treatment?
ANOVA (more than 2 groups)
What can between treatment variability be broken down to?
This variability can be further broken down to test specific hypotheses about which groups might differ from one another
We break down the variance according to hypotheses made a priori (before the experiment)
Providing that the hypotheses are independent of one another, the experimentwise type I error will be controlled
Define
Quasi-experiment design
An experimental design that that looks a bit like an experimental design but lacks the key ingredient – random assignment
Which sections are ANOVA and which are planned contrasts?


What is the H0 and H1 hypotheses for ANOVA?
H0: There really are no differences between the populations (or treatments). The observed differences between samples are due to chance (sampling error)
H1: The differences between the sample means represent real differences between the populations (or treatments). That is, at least 1 of the treatments really do have different means, and the sample data accurately reflect these differences
How do you minimise experimenter bias?
- Conduct a double-blind study (i.e., neither participant nor experimenter know which condition the participant is in)
In ANOVA, an independent variable (IV) is called a _____
Each (treatment) condition of a factor is called a _______
In ANOVA, an independent variable (IV) is called a factor
Each (treatment) condition of a factor is called a level
What does between treatment variance look like on this graph?


How do you minimise individual differences?
- Create equivalent groups using random assignment, holding constant, or matching
- Switch from a between-subjects to a withinsubjects or matched-subjects design
What type of experimental design is this? Why?
For example, researchers take data from two different schools that are expected to be similar. An intervention is tested is one school and not the other. The pretest-posttest change is then compared between schools
Quasi-experimental
This is quasi-experimental because participants (students) were not randomly assigned. There may indeed by some small differences between the groups
What is a partial eta squared?
An eta squares with the effects of individual differences removed from the denominator
What test is used to test sphericity? When is sphericity met?
Mauchly’s test
Sphericity assumption is met if variances are roughly equal. Therefore assumption is met when p > .05
In a one-way independent-measures ANOVA, what is the F-ratio made up of?
F = (treatment effect + individual differences + other error) / (individual differences + other error)
Numerator = variability between treatments
Denominator = variability within treatments
Define
True experimental research
a type of experimental design and is thought to be the most accurate type of experimental research that supports or refutes a hypothesis using statistical analysis.
Define
F-ratio
the ratio of the between group variance to the within group variance
Definition
The probability of making at least on type I error amongst a series of comparisons
Experiment-wise error rate (αEW)
What Lavene results suggests homogeneity assumption is met?
If p > .05, the variances are not significantly different from one another, thus the homogeneity of variance assumption is met
Why is a quasi-experiement not a true experiment?
- The independent variable was not experimentally manipulated (i.e., pre-existing levels are selected and compared); or
- The participants were not randomly assigned to conditions (e.g., groups were selected for analysis after the fact).
How do you minimise demand characteristics and participant reactivity?
- Switch from a within-subjects to a betweenor matched-subjects design
- Conduct a blind study
- Use measures which do not explicitly refer to construct being measured
True or False:
Partial eta squared is interpreted the same as eta sqaured
True
Definition
the extent to which a piece of evidence supports a claim about cause and effect, within the context of a particular study
Internal validity
How do you choose an alpha level that would control the experiement-wise error rate?
Bonferroni
αTW = αEW (desired) / number of tests
When do you use MANOVA over an ANOVA?
these results can be used in place of the regular ANOVA results if the sphericity or normality assumptions are violated. Whilst these tests are more robust than ANOVA to the assumption violations, they are also less powerful
What rules apply to choosing contrasts?
- Use control group at reference point
- Only comparing 2 chunks of variation
- Independence (orthogonal)
How do you minimise communication between groups?
- Conduct blind study (i.e., participants do not know which condition they are in)
- Switch to a within-subjects design
- Limit possibility of communication between groups (e.g., different locations)
How do you minimise generalising across participants or subjects?
- Use a probability sampling method such as proportionate stratified random sampling, or a non-probability method which tries to achieve the same result
- Increase sample size
Define
Test-wise error rate (αTW)
The alpha level used for each comparison
If the homogeneity of variance assumption is violated and you want to report the Brown-Forsythe or Welch F-ratio because they don’t assume homogeneity of variance, how would you do this?
You need to state that the homogeneity of variance assumption is violated, and that that is why you used the Brown-Forsythe or Welch F-ratio instead. You then simply report the results as usual – except that you used two decimal places for second df value (since such F-ratio calculations are based on adjustments being made to the df)
What are the three criteria that must be met for a true experiment?
There are three criteria that must be met in this type of experiment
- Control group and experimental group
- Researcher-manipulated variable
- Random assignment
What test is used to assess homogeneity of variances?
Levene statistic
What are reasons for within treatment variance?
Within each treatment, participants are treated the same, so chance would cause differences
η2 = .138 is what size effect?
Large
The t-test generates a ___-value, which is used to then determine a p-value
The t-test generates a t-value, which is used to then determine a p-value
What is the safest option during a post hoc test?
Bonferroni
Define
F statistic
F = variance between sample means / variance expected by chance
True or False:
A Greenhouse-Geisser correction changes the F ratio
False
A Greenhouse-Geisser correction changes the degrees of freedom
Define
One-way independent measures ANOVA
A test used to determine whether there are any statistically significant differences between the means of two or more independent (unrelated) groups
Define
One-way repeated-measures ANOVA
A type of ANOVA used to determine whether three or more group means are different where the participants are the same in each group
An F ratio close to 1 indicates what?
Strongly suggests little or no treatment effect
Chi-square goodness-of-fit test
A test used to compare the observed distribution to an expected distribution, in a situation where we have two or more categories in a discrete data. In other words, it compares multiple observed proportions to expected probabilities.
Chi-square test-for-contingencies
a procedure for testing if two categorical variables are related in some population
Kruskal-Wallkis test
a rank-based nonparametric test that can be used to determine if there are statistically significant differences between two or more groups of an independent variable on a continuous or ordinal dependent variable
Friedman test
non-parametric statistical test similar to the parametric repeated measures ANOVA, it is used to detect differences in treatments across multiple test attempts
What does statistical test selection depend on?
- How many dependent and independent variables there are
- What scales of measurement are used for each variable
- How many groups there are, and, whether these are independent- or repeated-measures
- Whether the assumptions have been met for a parametric statistical test
What type of data can be used for a non-parametric test?
Nominal/ordinal
What type of data can be used for a parametric test?
Interval/ratio
What is the non-parametric equivalent of a repeated-measures ANOVA?
Friedman test
What is the non-parametric equivalent for an independent-measures ANOVA?
Chi-square goodness-of-fit (nominal)
Kruskal-Wallis test (ordinal)
What is the non-parametric equavalent of a pearson correlation?
Chi-square test-for-independence (nominal)
Spearman correlation (ordinal)
To examine the relationship between texting and driving skill, a researcher uses orange cones to set up a driving circuit. A group of probationary drivers is then tested on the circuit, once while receiving and sending text messages and once without texting. For each driver, the researcher records the number of cones hit while driving each circuit. (Based on Gravetter & Wallnau, 2013, p. 674)
Which of the following is a suitable inferential statistics test for these data?
a) Independent-samples t-test
b) Paired-samples t-test
c) Repeated-measures ANOVA
d) Linear regression
To examine the relationship between texting and driving skill, a researcher uses orange cones to set up a driving circuit. A group of probationary drivers is then tested on the circuit, once while receiving and sending text messages and once without texting. For each driver, the researcher records the number of cones hit while driving each circuit. (Based on Gravetter & Wallnau, 2013, p. 674)
Which of the following is a suitable inferential statistics test for these data?
a) Independent-samples t-test
b) Paired-samples t-test
c) Repeated-measures ANOVA
d) Linear regression
“Hallam, Price, and Katsarou (2002) investigated the influence of background noise on classroom performance for children aged 10 to 12. In a similar study, students in one classroom worked on an arithmetic task with calming music in the background. Students in a second classroom heard aggressive, exciting music, and students in a third room had no music at all. The researchers measured the number of problems answered correctly for each student to determine whether the music conditions had any effect on performance.” (Gravetter & Wallnau, 2013, p. 674)
Which of the following would be an appropriate statistical test for these data?
a) Chi-square
b) Spearman correlation
c) Independent-samples t-test
d) Independent-measures ANOVA
“Hallam, Price, and Katsarou (2002) investigated the influence of background noise on classroom performance for children aged 10 to 12. In a similar study, students in one classroom worked on an arithmetic task with calming music in the background. Students in a second classroom heard aggressive, exciting music, and students in a third room had no music at all. The researchers measured the number of problems answered correctly for each student to determine whether the music conditions had any effect on performance.” (Gravetter & Wallnau, 2013, p. 674)
Which of the following would be an appropriate statistical test for these data?
a) Chi-square
b) Spearman correlation
c) Independent-samples t-test
d) Independent-measures ANOVA
“Belsky, Weintraub, Owen, and Kelly (2001) reported the effects of preschool childcare on the development of young children. One result suggests that children who spend more time away from their mothers are more likely to show behavioral problems in kindergarten. Suppose that a kindergarten teacher is asked to rank order the degree of disruptive behavior for the n = 20 children in the class.
Researchers then separate the students into two groups: children with a history of preschool and children with little or no experience in preschool. The researchers plan to compare the ranks for the two groups.” (Gravetter & Wallnau, 2013, p. 675)
Which of the following is the appropriate statistical test for these data?
a) Mann-Whitney U-test
b) Wilcoxon signed ranks test
c) Chi-square test-for-independence
d) Independent-samples t-test
“Belsky, Weintraub, Owen, and Kelly (2001) reported the effects of preschool childcare on the development of young children. One result suggests that children who spend more time away from their mothers are more likely to show behavioral problems in kindergarten. Suppose that a kindergarten teacher is asked to rank order the degree of disruptive behavior for the n = 20 children in the class.
Researchers then separate the students into two groups: children with a history of preschool and children with little or no experience in preschool. The researchers plan to compare the ranks for the two groups.” (Gravetter & Wallnau, 2013, p. 675)
Which of the following is the appropriate statistical test for these data?
a) Mann-Whitney U-test
b) Wilcoxon signed ranks test
c) Chi-square test-for-independence
d) Independent-samples t-test
“A researcher would like to determine whether infants, age 2 to 3 months, show any evidence of color preference. The babies are positioned in front of a screen on which a set of four colored patches is presented. The four colors are red, green, blue, and yellow. The researcher measures the amount of time each infant looks at each of the four colors during a 30 second test period. The color with the greatest time is identified as the preferred color for the child.” (Gravetter & Wallnau, 2013, p. 674)
Which of the following would be an appropriate statistical test for these data?
a) Single-sample t-test
b) Independent-measures ANOVA
c) Chi-square goodness-of-fit test
d) Chi-square test-for-independence
“A researcher would like to determine whether infants, age 2 to 3 months, show any evidence of color preference. The babies are positioned in front of a screen on which a set of four colored patches is presented. The four colors are red, green, blue, and yellow. The researcher measures the amount of time each infant looks at each of the four colors during a 30 second test period. The color with the greatest time is identified as the preferred color for the child.” (Gravetter & Wallnau, 2013, p. 674)
Which of the following would be an appropriate statistical test for these data?
a) Single-sample t-test
b) Independent-measures ANOVA
c) Chi-square goodness-of-fit test
d) Chi-square test-for-independence
Chi-square tests are intended for research questions concerning the ________ of the population in different categories
Chi-square tests are intended for research questions concerning the proportion of the population in different categories
What type of data can be used for a chi-square test?
Nominal data
What is the difference between actual and predicted values called?
Residual
What is the difference between an observed frequencey and an expected frequency called?
Residual
What are the two chi-square tests? How many variables do they examine?
Chi-square Goodness-of-Fit Test (1 nominal variable)
Chi-square Test-for-Independence (2 nominal variables)
The chi-square goodness-of-fit test uses ________ from a sample to test hypotheses about the shape or proportions of a population
The chi-square goodness-of-fit test uses frequency data from a sample to test hypotheses about the shape or proportions of a population
The numbers of individuals in each category of a chi-square goodness-of-fit test is called what?
Observed frequencies
What is the null hypothesis for a chi-square goodness-of-fit test?
The null hypothesis specifies the proportion of the population that should be in each category
The null hypothesis for the chi-square test for goodness of fit typically falls into one of two types:
- a no-preference hypothesis which states that the population is distributed evenly across the categories, or
- a no-difference hypothesis which states that the population distribution is not different from an established distribution
The proportions from the null hypothesis of a chi-square goodnes-of-fit are used to construct an ideal sample distribution, called _______________, that describe how the sample would appear if it were in perfect agreement with the null hypothesis
The proportions from the null hypothesis of a chi-square goodnes-of-fit are used to construct an ideal sample distribution, called expected frequencies (fe), that describe how the sample would appear if it were in perfect agreement with the null hypothesis
What is the formula for the expected frequency for each category in a chi-square goodness of fit?
fe = pn
Where:
- p* = the proportion stated in H0
- n* = sample size
True or False:
Expected frequencies can be decimal numbers
True
expected frequencies are hypothetical values
True or False:
χ2 can never be negative
True
χ2 can never be negative as the residuals (fo – fe ) are squared
Larger discrepancies between fo and fe produce _____ χ 2 values
Larger discrepancies between fo and fe produce larger χ 2 values
What effect size value do you use for chi-square goodness-of-fit?
Cohen’s w
w = sqrt(X2/N)
The chi-square test-for-independence is used to test whether or not there is a _______ between two categorical (nominal) variables
The chi-square test-for-independence is used to test whether or not there is a relationship between two categorical (nominal) variables
What is the null hypothesis for chi-square test-for-independence?
The null hypothesis for the chi-square test-forindependence can be phrased two ways:
- there is no relationship between the two variables (they are independent); or
- the distribution for one variable is the same (has the same proportions) for all the categories of the second variable
How do you calculate the degree of freedom for a chi-square test-for-independence?
df = (R - 1)(C - 1)
Where:
R = number of rows
C = number of columns
What are the steps for calculating the chi-square test-for-independence statistic?
- The null hypothesis is used to construct an idealised sample distribution of expected frequencies (fe ) that describes how the sample would look if the data were in perfect agreement with the null hypothesis (see picture)
- A chi-square statistic is then computed to measure the amount of discrepancy between the ideal sample (expected frequencies from H0 ) and the actual sample data (the observed frequencies, fo )

When estimating effect size for a chi-square goodness-of-fit test what coefficient should you use?
For 2 x 2 table use phi coefficient
For tables larger than 2 x 2 use Cramer’s V
What are the assumptions for chi-square tests?
- Independence of observations
- Expected frequencies should be at least 5
True or False:
Observed frequencies can be less than 5 in a chi-square test
True
What test do you use to compare two independent sets of ordinal scores or interval/ratio scores if independent-samples t-test assumptions are violated?
Mann-Whitney U-test
What test do you used to compare two sets of related or repeated-measures scores measured on an ordinal scale, or interval/ratio scores if relatedsamples t-test assumptions are violated?
Wilcoxon signed-ranks test
For Mann-Whitney and Wilcoxon tests, the _______ the test statistic, the larger the difference between groups or conditions
For Mann-Whitney and Wilcoxon tests, the smaller the test statistic, the larger the difference between groups or conditions
What is the null hypothesis for the Mann-Whitney U-test?
the ranks for one group are not systematically higher or lower than the ranks for another group
What is the null hypothesis for the Wilcoxon signed-ranks test?
difference scores are not systematically positive or negative
Which test is used to evaluate differences between three or more treatment conditions (or populations) using ordinal data from an independent-measures design
Kruskal-Wallis test
What is the difference between a Kruskal-Wallis test and a one-way independent-measures ANOVA?
ANOVA requires interval or ratio scale scores that can be used to calculate means and variances
The Kruskal-Wallis test, on the other hand, simply requires that you are able to rank order the individuals for the variable being measured
A ___________ can be used as the nonparametric alternative to a one-way independentmeasures ANOVA if the assumptions of the ANOVA are violated
A Kruskal-Wallis test can be used as the nonparametric alternative to a one-way independentmeasures ANOVA if the assumptions of the ANOVA are violated
The Kruskal-Wallis test is similar to the Mann-Whitney test. However, the ___________ is limited to comparing only two treatments, whereas the __________ is used to compare three or more treatments
The Kruskal-Wallis test is similar to the Mann-Whitney test. However, the Mann-Whitney test is limited to comparing only two treatments, whereas the Kruskal-Wallis test is used to compare three or more treatments
What is the null hypothesis for the Kruskal-Wallis test?
There is no tendency for the ranks in any treatment population to be systematically higher or lower than the ranks in any other treatment population.
What is the alternative hypothesis for the Kruskal-Wallis test?
The ranks in at least one treatment population are systematically higher or lower than the ranks in another treatment population.
How do you calculate the Kruskal-Wallis H statistic?
- Combine the individuals from all the separate samples and rank order the entire group
- i.e., rank all scores without regard to treatment condition
- Regroup the individuals into the original samples and compute the sum of ranks (T) for each sample
- i.e., add up the ranks for each treatment condition
- The following formula is used to compute the KruskalWallis statistic – which is distributed as a chi-square statistic with degrees of freedom equal to the number of samples minus one

If the null hypothesis of a Kruskal-Wallis test is true what do we expect?
If the null hypothesis is true, we would expect the sums of ranks (T’s) to be more or less equal (aside from differences due to the sizes (n’s) of the samples). Thus, the Kruskal-Wallis statistic measures the degree to which the T’s differ from one another.
True or False:
Kruskal-Wallis assumes normality and homogeneity of variance
False
Kruskal-Wallis does not assume normality and homogeneity of variance
When ranking scores for a Kruskal-Wallis test, what do you do to tied scores?
Give tied scores the average of the affected rank positions
Like with the Mann-Whitney U-test, the _________ provide information about which groups had larger values than others.
Like with the Mann-Whitney U-test, the Mean Ranks provide information about which groups had larger values than others.
How do you calculate the number of pairwise comparisons?
number of comparisons = k(k - 1) / 2
where k = number of treatment conditions
What type of post hoc test do you conduct for a Kruskal-Wallis test?
The ___________ is used to evaluate differences between three or more treatment conditions using ordinal data from a repeated-measures design
The Friedman test is used to evaluate differences between three or more treatment conditions using ordinal data from a repeated-measures design
What is the difference between a one-way repeated-measures ANOVA and a Friedman test?
ANOVA requires interval or ratio scale scores that can be used to calculate means and variances
The Friedman test, on the other hand, simply requires that you are able to rank order the individuals across treatments
A __________ can be used as the nonparametric alternative to a one-way repeated-measures ANOVA if the assumptions of the ANOVA are violated
A Friedman test can be used as the nonparametric alternative to a one-way repeated-measures ANOVA if the assumptions of the ANOVA are violated
For both a Kruskal-Wallis and a Friedman test what must interval/ratio scale data be converted to?
Ordinal data
The Friedman test is similar to the Kruskal-Wallis test. However, the ___________ is used for independent-measures designs, whereas the _____________ is used for repeated-measures designs
The Friedman test is similar to the Kruskal-Wallis test. However, the Kruskal-Wallis test is used for independent-measures designs, whereas the Friedman test is used for repeated-measures designs
What is the null hypothesis for a Friedman test?
The ranks in one treatment condition should not be systematically higher or lower than the ranks in any other treatment condition.
What is the alternative hypothesis for a friedman test?
The ranks in at least one treatment condition should be systematically higher or lower than the ranks in another treatment condition.
How do you calculate the Friedman statsitic, XF2?
- Each individual (or the individual’s scores) must be ranked across the treatment conditions
- i.e., for each participant, rank the scores in the treatment conditions from smallest to largest
- Compute the sum of ranks (R) for each treatment condition
- i.e., add up the ranks for each treatment condition
- The following formula is used to compute the Friedman statistic – which is distributed as a chi-square statistic with degrees of freedom equal to the number of treatments minus one

If the null hypothesis of a Friedman test is true what do we expect?
If the null hypothesis is true, we would expect the sums of ranks (R’s) to be more or less equal. Thus, the Friedman statistic measures the degree to which the R’s differ from one another.
What post hoc tests do you conduct following a significant Friedman test?
Wilcoxon signed-ranks test