(Quantitative): Comparing groups: continuous variables Flashcards
What is statistical data analysis?
• Organise and analyse the data • Common procedures used in analysis: -Descriptive Statistics - Inferential Statistics (more of a focus this term) • Need to get data into shape
What does data analysis need to do/have?
- Needs to have a purpose
- Describe
- Compare
- Examine similarities
- Examine differences
What is descriptive statistics (recap from last semester)?
- Check for errors and outliers
- Describe & summarise
- Spread of the data
- Ensure appropriate analysis
- Data parametric or non‐parametric
What ways can data be summarised (ratio or interval)?
• Measure of Central Tendency -Mean, Median, Mode - If not normal‐median • Measure of Dispersion -Variation, Range, Standard Deviation • Normal Curve, Skewness, Kurtosis
What are inferential statistics?
All statistical tests of common structure:
• Set up a null and alternative hypothesis
• Establish a level of statistical significance (also known as alpha (α), usually set at 5% or 1%)-depends on study
• Determine statistical significance of the findings‐ p value
• Accept or reject the null hypothesis-is there a difference between two groups
- SPSS output provides a p‐value (probability value)
- If the p‐value is greater than the alpha you cannot reject the null hypothesis
note:may be statistically significant but not actually meaningful e.e.g 1 second difference in a marathon
What are the steps when undertaking a hypothesis test?
- Define study question
- Set null and alternative hypothesis
- Calculate a test statistic
- Calculate a p value
- Make a decision and interpret
What is the students t-test?
- The t‐test is used to compare means between groups
- t‐test is easy to use but can be easily misused
- Most common statistical procedure used by researchers
What are the two types of t-test?
Same principles behind each but there is more random error in the …
• INDEPENDENT SAMPLES DESIGN because the control
group might, by chance, be very different from the treatment group
• With the PAIRED DESIGN, each person is their own control so variation is limited
What is the process in the decision chart for bivariate data?
- Paired data–>Paired samples t-test–>wilcoxon signed rank test (for non-parametric)
- Independent data–>Independent samples t-test–>Mann Whitney u-test (for non-parametric)
What is independent data?
• Data comes from different (independent) groups of people
• Eg. classic experiment (eg. Group 1 receives intervention A,
Group 2 receives intervention B).
• Study participant is in one group only
• Compare differences between groups (mean or median)
What is paired data?
• Data comes from one group of individuals
• Data collected from an individual at different points in time or under different conditions
• Compare differences in outcome between time 1 and time 2 or condition 1 and 2 (mean or median)
• Other terms: repeated measures, before and after study
e.g. cycling speed with different helmets
Explain the independent samples design
• Dependent Variable is ratio/interval and Independent Variable has two categories
• Measurements in condition 1 are independent of
measurements in condition 2
• If the H0 is true we expect the difference between the mean of condition 1 group and condition 2 group to be zero
Explain the issue of error in an independent t-test
• The act of using a sample will introduce error
-What is the probability that the difference we found occurred by chance?
-If less than 5%, reject H0 and accept HA (or written H1
‐ alternative)
Explain the sampling distribution of the independent samples t-test?
• T-distribution shape similar to normal curve
• the middle is the population parameter when H0
is true (i.e. the mean difference is 0)
• Around it are all the possible sample statistics
• Is ‘our’ difference so big that would only rarely happen by chance?
- Rarest 2.5% in both tails
What are the assumptions for the independent samples t-test?
- Dependent Variable is ratio/interval
- If either group is small (30 or less), distribution of Dependent Variable for each group should not be badly skewed
- The variance of the Dependent Variable for the two groups should not be very different
How is a problematic difference indicated in an independent t-test?
• A problematic difference in variances is indicated by a significant Levene’s Test:
- If significant, interpret the p value associated with ‘equal variances not assumed’
- If non‐significant, interpret p value associated with ‘equal variances assumed’
What about paired data that is normal? (check)
- Observations not independent
- Paired equivalent‐ Paired sample t test (same assumptions)
- H0: No difference in the means before and after
- H1: A difference in the means before and after
Explain the paired samples t-test design
- Dependent Variable is ratio/interval and Independent Variable has two categories
- Each measurement in Cond 1 (performance with caffeine) has a match in Cond 2 (performance with water)
- One measurement is deducted from the other so that each case has a different score
- If the null hypothesis (H0 ) is true and there is no difference in performance with e.g. caffeine and with water we would expect the group mean difference score to be 0
Explain the issue of error in a paired-samples t-test
the act of using a sample will introduce error
• What is the probability that the difference we found occurred by chance?
• If less than 5%, reject H0 and accept HA
(or written H1 ‐ alternative)
What would non-parametric equivalents to a t-test be used?
- If we have an ordinal scale Dependent Variable, or a ratio/interval Dependent Variable that does not meet parametric assumptions we use non‐parametric equivalents
- These compare medians (ranks) (not affected by extremes) rather than means
- They are usually less powerful
When should non-parametric tests be used?
- used when assumptions of parametric tests are not met (i.e. breached) e.g. the level of measurement (e.g., interval or ratio data), normal distribution, and homogeneity of variances across groups
- It is not always possible to correct the distribution of a data set
- In these cases we have to use non‐parametric tests
- They make fewer assumptions about the type of data on which they can be used
- Many of these tests use “ranked” data
But what if my data are independent but nonparametric?
• Mann‐Whitney U test e.g. income ranks between teams
What is the mann-whitney u-test
- It is used to test the null hypothesis that two samples come from the same population (i.e. have the same median)
- or, alternatively, whether observations in one sample tend to be larger than observations in the other
What are the assumptions of the mann whitney u-test?
• (also known as the Mann‐Whitney U) is similar to the two independent samples t‐test
• Data must meet the requirement that the two
samples are independent
• The Mann‐Whitney procedure uses ranks instead of
the raw data values
• Data values are assigned ranks relative to both
samples combined
When should a mann-whitney u-test be used?
- The sample sizes are small and normality is questionable.
- The data contain outliers or extreme values that, because of their magnitude, distort the mean values and affect the outcome of the comparison.
- The data are ordinal
- Assumes distributions of two groups being compared are the same shape
- Assumes not too many ties in ranks of data
What is used for paired non-parametric?
• The Wilcoxon Signed‐Rank test or sign test
• Can use interval, ratio or ordinal data
• Null hypothesis the same as for Mann‐Whitney U test
but for paired data
Explain the Wilcoxon signed-rank test
- The Sign test can be used to measure the differences between each variable as nonparametric alternatives to the one sample t‐test
- The Wilcoxon Signed‐Rank test can be used to compare paired data as nonparametric alternatives to the paired t‐test
- These tests are used when you cannot justify a normality assumption for the differences
- The sign test is very simple in that it counts the number of differences that are positive (+) and those that are negative (‐) and makes a decision based on these counts
Give examples of paired and independent data?
Paired: Cyclicling rate difference between people within a company
Independent: comparing cycling rates across companies
Explain the process of hypothesis testing
All statistical tests of common structure:
• Set up a null hypothesis
• Establish a level of statistical significance (also known as alpha (α), usually set 5% or 1%)
• Determine statistical significance of the findings
• Accept or reject the null hypothesis
• SPSS output provides a p‐value (probability value)
• If the p‐value is greater than the alpha you can accept
the null hypothesis
Explain the p-values
• The p value quantifies the chance of observing such a
value of the test statistic (or one more extreme) if the
null hypothesis was actually true
• If set an Alpha level of .05 (5%) then you decide to
reject H0 and accept HA when p is no more than .05
• This leaves up to 5% chance that you are wrong in
concluding that there is a difference (making a Type 1
error)
What are type 1 and 2 errors?
- Type I Error is the rejection of a null hypothesis when it is true. (This probability is known as significance level of test usually at 5% or 1% and decided before the test is conducted. (false posiitve) e.g. tell a man they’re pregnant
- Type II Error is the failure to reject the null hypothesis, which is false. (false negative) e.g. tell a preganant woman they a not pregnant
What are some limitations of independent t-tests?
• Doesn’t take into the impact of population size on the chances of type 1 error
What does the Levene’s test?
Homogeneity of variances
what does 0.000 tend to mean in spss?
<0.001