Quantitative Revision Flashcards
In what circumstances would you perform a simple linear regression test?
To determine if there are linear relationships/associations between ratio/interval variables i.e. X and Y
Enable prediction of the values of Y (DV) from the values of X (IV)
What assumptions must be met in order for you to use the simple linear regression test with your data?
Ratio/interval data
Linear relationship between X and Y
Data are randomly sampled
No outliers amongst data
Residuals must be approximately normally distributed
What would be an appropriate null and alternative hypotheses for the simple linear regression test?
Non-directional (two-tailed)
Directional (one-tailed)
H0: There is no linear relationship between X and Y.
H1: There is a linear relationship between X and Y.
H0: There is no positive linear relationship between X and Y.
H1: There is a positive linear relationship between X and Y.
Describe what the results mean for a simple linear regression test./
Interpret the results
Write-up of conclusion and results
Standardized coefficient
r: strength of the relationship between X and Y (with 1 being the strongest)
Beta: predictedeffectonYif X increases by 1 SD –> When X increasesby1SD,Yispredictedtoincreaseby.85SDs UsefulwheretherearemultipleIVs(inmultipleregression)
r^2: represents the variability in Y that can be explained by X
Unstandardized coefficient
b: For every increase in 1 unit of X, Y increases by b units
a: only interpret this if it makes sense/there is meaning/it is useful in knowing the value of Y when X = 0
significance (sig.) (i.e.p-value).: tells us the significance of
association between X and Y
effect of X on Y
The statistical significance associated with height matters
IGNORE the statistical significance associated with the constant
Be sure to answer in terms of the question and its scenario
In what circumstances would you perform a Pearson’s (r) correlation test?
To determine the (strength and direction of an) association between 2 variables i.e. X and Y, where neither is categorical, but instead continuous outcome:
ratio/interval(parametric) e.g. weight (kg)
ordinalscale(non‐parametricequivalent) e.g. world ranking No.1, No.5 etc.
Parametric data
What assumptions must be met in order for you to use the Pearson’s (r) correlation test with your data?
X and Ymustberatio/interval
Linearassociation between X and Y(scatterplot)
Theassociationmustshowhomogeneity of variance(scatterplot), wherethedatapointsareevenly distributedalongtheregressionline
Data for X and Y should follow a normal distribution (histogram, box plot, normal probability Q-Q plot, skewness and kurtosis z-scores, mean = median)
No outliers (scatter plot, box plot)
Ideally,shouldonlybeused withasampleofn>=100
[Forsmallersamplesizes,thereisariskthatoneortwo extremedatapoints‘drive’theassociation]
What would be an appropriate null and alternative hypotheses for the Pearson’s (r) correlation test?
Non-directional (two-tailed)
Directional (one-tailed)
H0: There is no association between X and Y.
HA: There is an association between X and Y.
H0: There is no positive association between X and Y.
H1: There is a positive association between X and Y.
Describe what the results mean for a Pearson’s (r) correlation test./
Interpret the results
Write-up of conclusion and results
The results show a significant/non-significant (significance) weak/strong (strength) negative/positive (direction) correlation between X and Y
r: represents the strength of the relationship/association between X and Y
sig (i.e.p-value).: tells us the significance of the association between X and Y
r^2: represents the variability in Y that can be explained by X
In what circumstances would you perform a Spearman’s (rho) test?
Spearman’s rho calculates the ranked scores for each variable and considers the association between the ranks
To determine the (strength and direction of an) association between the ranks of X and Y, where X and Y are both non-categorical (i.e. not ordinal)
Non-parametric data i.e. parametric assumptions have been violated/breached
What assumptions must be met in order for you to use the Spearman’s (rho) test with your data?
X and Ymustberatio/interval
Association between the ranks of X and Y does not need to be linear but it must be monotonic (i.e. does not change direction) (scatterplot)
Theassociationmustshowhomogeneity of variance(scatterplot), wherethedatapointsareevenly distributedalongtheregressionline
Onlyappropriatewheren (samplesize) is at least 20 or more
What would be an appropriate null and alternative hypotheses for the Spearman’s (rho) test?
H0: There is no association between the ranks of X and Y.
H1: There is an association between the ranks of X and Y.
H0: There is no positive association between the ranks of X and Y.
H1: There is a positive association between the ranks of X and Y.
Describe what the results mean for a Spearman’s (rho) test.
The results show a significant strong positive correlation between the ranks of X and Y
In what circumstances would you perform a Kendall’s (tau) test?
To determine the (strength and direction of an) association between the ranks of X and Y, where X and Y are both non-categorical (i.e. not ordinal)
Non-parametric data (data is not normally distributed) i.e. parametric assumptions have been violated/breached
Useful with small data set n < 20
Can deal with a large number of tied ranks in the data
What assumptions must be met in order for you to use the Kendall’s (tau) test with your data?
Bothvariablesmustberatio/interval
Association between the ranks of X and Y does not need to be linear but it must be monotonic (i.e. does not change direction) (scatterplot)
Theassociationmustshowhomogeneity of variance(scatterplot), wherethedatapointsareevenly distributedalongtheregressionline
Onlyusefulwheren < 20
What would be an appropriate null and alternative hypotheses for the Kendall’s (tau) test?
H0: There is no association between the ranks of X and Y.
H1: There is an association between the ranks of X and Y.
H0: There is no positive association between the ranks of X and Y.
H1: There is a positive association between the ranks of X and Y.
Describe what the results mean for a Kendall’s (tau) test.
The results show a non-significant weak negative correlation between the ranks of X and Y
In what circumstances would you perform a multidimensional Chi-Square test?
Relationship/association between variables (Test of association)
Variables are both categorical i.e. nominal
Independent research design (No subjects/participants appears in > one group)
[Compare the observed and expected counts i.e. Test for differences where samples are independent]
What assumptions must be met in order for you to use the multidimensional Chi-Square test with your data?
Randomly sampled
Variables must be categorical i.e. nominal
Independentmeasures
Counts(actualnumbers), notpercentages
No calculatedexpected value < 1
No > 20% of expected values < 5
Solution=collect more data, collapse categories, or use an exact test (SPSS)
What would be an appropriate null and alternative hypotheses for the multidimensional Chi-Square test?
ResearchQuestion: Does the proportion of athletes who are normal weight or overweight differ by sport?
(H0):Inthepopulation,thethreesportsdo not differ in the proportions who are normal and overweight.
(H1):Inthepopulation,thethreesportsdo differ in the proportions who are normal and overweight.
Describe what the results mean for a multidimensional Chi-Square test./
Interpret the results
Write-up of conclusion and results
Method
A Chi-square test was performed to test the H0 that the 3 sports do not differ in the proportions who are normal and overweight
Results
There was a difference between the proportion of those athletes who are normal and those who are overweight in the 3 sports (Field, Netball and Rowing), Chi-Square statistic = … (df = …, n = …), p = …
Basically: Method: Test was performed to test the H0 Results: Conclusion/result Chi-Square statistic df n p-value
In what circumstances would you perform a McNemar’s (Chi-Square) test?
Relationship/association between variables (Test of association)
Variables are both nominal
Repeatedmeasuresdesignwithtwo dichotomous variables
[Test for differences where samples are paired]
What assumptions must be met in order for you to use the McNemar’s test with your data?
Randomly sampled
Dependent/repeated measures
DV and IV must be
dichotomous
of only 2 categories each
Variables must be categorical i.e. nominal
Counts(actualnumbers), notpercentages
No calculatedexpected value < 1
No > 20% of expected values < 5
Solution=collect more data, collapse categories, or use an exact test (SPSS)
What would be an appropriate null and alternative hypotheses for the McNemar’s test?
Research question: To investigate the number of correct identifications of the writer’s sex by their handwriting style
49Psychologystudentswereaskedtowriteusingtheir normal handwritingandthenaskedtowriteimitatingthe handwritingoftheopposite sex
Students recruited a participant to judge the handwriting of both samples and identify the sex (repeatedmeasures)
IV: handwritingstyle
DV:participant’sjudgementof handwriter’ssex
H0: There will be no difference in the number of correct identifications of the writer’s sex from the 2 handwriting samples.
H1: There will be a difference in the number of correct identifications of the writer’s sex from the two handwriting samples.
Describe what the results mean for a McNemar’s test.
Method
A McNemar’s Chi-Square test was performed to test the H0 that there will be no difference in the number of correct identifications of the writer’s sex from the two handwriting samples
Results
There is a significant difference in the number of correct judgements between the two conditions of handwriting style (n = …, exact p = …)
Of the 49 participants, ‘..’ correctly identified the handwriter’s sex for normal writing. Of the ‘…’ who were incorrect for the normal handwriting, ‘…’ of them correctly identified the handwriter’s opposite handwriting
In what circumstances would you perform an independent samples design t-test?
Parametric data
Independent (i.e. different) data/groups/samples
To compare means - compare sample mean to another sample mean
i.e. to compare differences between groups (mean)
e.g. Intervention and control group –> study participant is in one group only
[independent data: data that comes from different (independent) groups of people]
What assumptions must be met in order for you to use the independent t-test with your data?
DependentVariableisratio/interval
Measurementsincondition1areindependentof measurementsincondition2
For n < or equal to 30 –> distribution of DV data for each group (X and Y) should not be badly skewed i.e. should follow a normal distribution
(Can use CLT to help explain, if we still remember)
Homogeneity of variance:
Thevariance oftheDVdata forthetwogroupsshould not be very different
A problematic difference in variances is indicated by a significant Levene’s Test
Ifsignificant,interpretthep-valueassociatedwith‘equal variancesnotassumed’
Ifnon‐significant,interpretp-valueassociatedwith‘equal variancesassumed’
What would be an appropriate null and alternative hypotheses for the independent t- test?
two-tailed
one-tailed
H0: There is no difference between the population means of X and Y.
H1: There is a difference between the population means of X and Y.
H0: The population mean of X not > population mean of Y.
H1: The population mean of X > population mean of Y.
Describe what the results mean for an independent t-test.
p < or equal to 0.05 or 0.01 –>
There is a significant difference between the population means of X and Y
or
The population mean of X is significantly > population mean of Y
In what circumstances would you perform a paired design t-test?
Parametric data
Dependent/paired (i.e. same) data/groups/samples
To compare means - compare sample mean to another sample mean
i.e. to compare differences within groups (mean)
e.g. pre-test post-test study
Data collected from an/the same individual at different points in time/under different conditions
Compare differences in outcome between time 1 & 2 or condition 1 & 2 (mean)
[dependent/paired data: data that comes from one group of individuals]
What assumptions must be met in order for you to use the paired t-test with your data?
DependentVariableisratio/interval
Observationsnotindependent
EachmeasurementinCondition/TIme1hasamatchin Condition/Time2
For n < or equal to 30 –> distribution of differences between X and Y (i.e. X - Y) should not be badly skewed i.e. should follow a normal distribution
(Can use CLT to help explain, if we still remember)
Homogeneity of variance
What would be an appropriate null and alternative hypotheses for the paired t-test?
two-tailed
one-tailed
H0: No difference in the means before and after.
H1: A difference in the means before and after.
H0: Mean after < or equal to mean before.
H1: Mean after > mean before.
or
H0: Mean difference = 0.
H1: Mean difference is not = 0.
H0: Mean difference is not positive.
H1: Mean difference is positive.
Describe what the results mean for a paired t-test.
p < or equal to 0.05 or 0.01 –>
Significant difference between the means before and after
or
Mean after is significantly > mean before
In what circumstances would you perform a Mann Whitney U test?
Non-parametric data:
OrdinalscaleDV
Ratio/intervalDVthatdoesnotmeetparametric assumptions
(Samplesizesaresmallandnormalityis questionable
Datacontainoutliersthat becauseof theirmagnitude distort themeanvaluesandaffecttheoutcomeofthecomparison)
Independent (i.e. different) data/groups/samples
To compare mean ranks/medians - compare sample medians to another sample median
i.e. to compare differences between groups (median)
e.g. Intervention and control group –> study participant is in one group only
[Totestthe H0that
2 samplescomefromthesamepopulation(i.e.have the same median)
observationsinonesample>than observationsintheother]
What assumptions must be met in order for you to use the MWU test with your data?
Independent data/samples
Data distributionsofX and Yarethesameshape
Nottoomanytiesinranksofdata
[Datavaluesareassignedranksrelative tobothsamples combined]
What would be an appropriate null and alternative hypotheses for the MWU test?
Two-tailed
One-tailed
H0: There is no difference between the population medians of X and Y.
H1: There is a difference between the population medians of X and Y.
H0: The population median of X not > population median of Y.
H1: The population median of X > population median of Y.
Describe what the results mean for a MWU test.
p < or equal to 0.05 or 0.01 –>
There is a significant difference between the population medians of X and Y
or
The population median of X is significantly > population median of Y
In what circumstances would you perform a Wilcoxon signed rank test?
[A Wilcoxon signed rank test:
Measuresthedifferencesbetweeneachvariable
Comparespaireddata
Is usedwhenyoucannot justify a normality assumption forthedifferences
Very simple–>countsthenumberofdifferencesthatare positive (+) and those that are negative (‐) and makes adecisionbasedonthesecounts]
Non-parametric data
Dependent/paired (i.e. same) data/groups/samples
To compare medians - compare sample medians to another sample median
i.e. to compare differences within groups (median)
e.g. pre-test post-test study
Data collected from an/the same individual at different points in time/under different conditions
Compare differences in the ranks of the outcome between time 1 & 2 or condition 1 & 2 (median)
What assumptions must be met in order for you to use Wilcoxon test with your data?
Paired/dependent data/samples
Non-categorical data
What would be an appropriate null and alternative hypotheses for the Wilcoxon test?
H0: No difference in the medians before and after.
H1: A difference in the medians before and after.
H0: Median after < or equal to median before.
H1: Median after > median before.
or
H0: Median difference = 0.
H1: Median difference is not = 0.
H0: Median difference is not positive.
H1: Median difference is positive.
Describe what the results mean for a Wilcoxon test.
p < or equal to 0.05 or 0.01 –>
Significant difference between the medians before and after
or
Median after is significantly > median before
What is a type I error?
False positive
Incorrectly rejecting the H0 when it is actually true
Saying that there is a difference when in reality/actually there is no difference
e.g. Telling a man that he is pregnant
What is a type II error?
False negative
Incorrectly failing to reject i.e. accepting the H0 when it is actually wrong
Saying that there is no difference when in reality/actually there is a difference
e.g. Telling a pregnant women that she is not pregnant (when it is so obvious that she is!)
What is the common structure of all statistical tests?/What are the 7 steps of hypothesis testing?
Set H0 and H1
Establish alpha i.e. level of significance
Determine p-value
Accept or reject H0
OR
Define study question and choose an inferential test
Set hypotheses
Select/establish level of significane i.e. alpha = 0.05
EDA and assess test assumptions to see if they are met/satisfied
Go ahead and run the test
Obtain p-value
Decide whether to reject or accept H0 + conclusion, interpretation and write-up of results
What is the benefit of using a paired t-test over an independent t-test?
Independent t-test gives rise to more random error because the control group might, by chance, be very different from the treatment group
Variation is limited in paired t-test as each person is their own control
What are residuals?
= Predicted - actual value of y
Difference between the predicted value of Y (line) and the actual value of Y (points)
An observable estimate of the unobservable statistical error
What is the simple linear regression equation?
Y=a+bX
i.e. DV=constant+ coefficient x (IV)
a: constantorintercept
b: coefficient or slope of the line associated with this independent variable
AsXincreasesby1unit, Y increases by b unit
What does r^2 = 0.8 mean?
80%ofvariabilityinYisexplainedbyX
*Note: Inanexam,interprettheAdjustedRSquare (if it is given) as it is more accurate
What is the assumption that all inferential tests make about the sample?
The sample is randomly sampled from the population
What is heteroscedasticity?
No linearity
Data points fan out, does not go along regression line (evenly)
How do we obtain the p-value for one-tailed test (directional) from the p-value of/for two-tailed test (non-directional)?
p-value for one-tailed test = Half the p-value for two-tailed test
What is the difference between one-tailed and two-tailed tests with regard to rejecting the H0?
Two-tailed tests are non-directional. We would reject H0 if we found a positive or negative association or difference etc.
One-tailed tests are directional. We only reject H0 if the association or difference etc. is in the direction that we specified/expected
What does the multidimensional Chi-Square test compare?
Compares observed frequencies in our sample with the frequencies we would expect if there were no relationship at all between the twovariables in the population that the sample was drawn from
What is the formula for Chi-square?/How do we obtain a Chi-square statistic?
Chi-Square =SUM((O‐E)^2/E)
O: observed count
E: expected count
For each cell, apply the formula (O-E)^2/E
Then sum up all the cells to get the Chi-Square statistic
What is (the concept of) degrees of freedom?
How do we calculate it?
The more categories there are in the IV and DV, the more chance there is of the analysis being affected by sampling error
(No. of categories in the row variable minus 1) x (No. of categories in the column variable minus 1)
i.e. (rows-1)(columns-1)
EXCLUDE marginal cells!
From the study done by Chris Gratton and Ian Jones on Research methods for Sports Studies (2008), what are the 4 purposes of data analysis?
Describe
Compare
Examine similarities
Examine differences
What are the aims of Descriptive statistics?
Check for errors and outliers
Describe and summarise the data
Spread of the data
Ensure appropriate analysis
Data parametric or non-parametric?
Ways of summarising interval/ratio data
Measure of Central Tendency
mean
median
mode
Measure of Dispersion
range
SD
variance
Normal curve, skewness, kurtosis
What do parametric tests assume about the characteristics of the sample in terms of its distribution?
Data is drawn from a normally distributed population (i.e. data is not skewed)
Have the same variance or spread on the variables being measured
What assumptions do non-parametric tests make about the characteristics of the sample in terms of its distribution?
Do not make any assumption
What is p-value?
ExactprobabilitythatH0 istrue
Probability that the difference found occurred by chance
When do we use non-parametric tests?
When assumptions of parametric tests are not met (i.e.breached)
levelofmeasurement (e.g.,interval or ratio data)
normal distribution
homogeneity of variances across groups
Not always possible to correct for problems with the distribution of a data set (i.e. data transformation) –> havetousenon‐parametrictests:
Make fewer assumptions about the type of data on which they can be used
Manyofthesetestsuse“ranked”data
What is alpha/level of significance?
The chance of making a Type 1 error and tolerating it
Alphalevelof.05(5%), decidetorejectH0 andacceptHA whenp-value isnomorethan.05 –>
up to 5% chance that you are wrong in concluding that there is a difference (makingaType1error) when there actually isn’t (false positive)