Lecture 4: Analysis of Continuous and Categorical Variables Flashcards
Descriptive vs Inferential Statistics
descriptive: describing the central tendency and dispersion of data
inferential: use sample data to draw conclusions about the population that the sample is mean to represent (sampling will naturally involve error)
o Estimate parameters and test hypotheses to make inferences about the population
o Compare means and evaluate relationships
o Test statistics, p-values, confidence intervals
Which test do I apply if I have 2 related samples and parametric data?
paired t-test
Which test do I apply if I have 2 related samples and non- parametric data?
Wilcoxon test
Which test do I apply if I have 2 independent samples and parametric data?
Independent t-test
Which test do I apply if I have 2 independent samples and non-parametric data?
Mann-Whitney U test
Which test do I apply if I have 3 or more groups and parametric data?
ANOVA
Which test do I apply if I have 3 or more groups and non-parametric data?
Kruskal-Wallis test
What is student t-test?
Used to compare means between two groups
o Related groups: paired t-test (e.g. pre- and post- study measures on the same participants)
o Independent groups: unpaired/independent t-test
What is the null hypothesis in student t-test?
the means of the groups are not statistically different
What is the degrees of freedom in student t-test?
amount of information provided by the data that can be used to estimate population parameters and
variability of the estimates
o df = n - # of estimated parameters
o As df increase, t-distribution more closely resembles a normal distribution
• E.g. One sample independent t-test to estimate the population mean
o Estimates the standard deviation about the mean o Uses a t-distribution with df = n-1
o Df=n–1 for paired t-test as well
• E.g. Two sample independent t-test to compare two means
o Uses a t-distribution with df = n1 + n2 – 2
What are the assumptions of T-test?
• Samples are independent
• Variable is normally distributed
• Variance homogeneity variance within each group is equal
o Levene’s test for equality of variances in SPSS (automatically conducted)
o Informs you whether to use results for pooled or unpooled variance
• T-tests fairly robust even if assumptions are not perfectly met
What is t-statistics?
Difference between the means divided by the pooled or unpooled standard error of the mean
What is confidence interval?
Degree of uncertainty: area around the sample statistics where the corresponding population parameter is likely to be
True/False
The larger the sample, the smaller the CI
True
Greater likelihood that the sample statistics approximates the population parameter
True/False
If CI contains 0 (null value) then the means are not statistically different (non-significant finding)
True
How to report independent t-test results?
Report the means and standard deviations for both groups, t-value, degrees of freedom, and p-value.
o E.g. Males (mean ± StD): 24.0 ± 4.1, Females (mean ± StD): 22.9 ± 4.0; t = 1.3, df = 116, p = 0.20
P-values from t-test often reported in subject characteristics table when data is compared between two groups (e.g. males vs. females, intervention vs. control group)
Analysis of Variance ANOVA
Test to determine if means differ between 3 or more groups o Unlike t-tests, uses variance to assess differences
• Test null hypothesis that variances between the groups are equal
o F-test will result in rejection of null hypothesis when variability between group means is sufficiently larger than variability
within the groups.
F-statistics
If ratio is large, indicates not all means are equal –> significant p-value will result
In an F-test what does it mean to reject the null hypothesis?
variability between group means is sufficiently larger than variability within the groups
Degrees of freedom in ANOVA
• Df1: df associated with the number of the F-statistics o Df=k–1
o K: # of group means
• Df2: df associated with the denominator of the F-
statistic
o Df=n–k
The first one represents between group variability, so this is calculated by k-1 (k is the number of groups). Nominator is the within group variability and that is calculated by n-k (sample size – the group means).
What does variability look like in SPSS?
if the group has low group variability within the dots are close together. if there is high variability between the groups then the dots are more dispersed, so you are more likely to find a significant difference with high variability between your groups.
If the dots are more spread apart there will be more variation less dispersion more precision high variability within group, the difference between the groups are larger than the difference within the groups, you will have a significant finding.
True/False
Low F value, less variability between your groups will lead to a retainment of the null hypothesis.
True
What are the assumptions of ANOVA?
- Variable is normally distributed
- The errors are normally distributed
- The cases are independent from each other
- Variance homogeneity
How to report ANOVA results?
- Report means and standard deviations for all groups, as well as the F value, degrees of freedom, and p-value
- Test for this example:
o Analysis of variance indicated that the different physical activity groups report different levels of caloric intake: § Sedentary:1640.1±516.7 § Light:2030.8±570.9 § Moderate:1999.4±670.7 § High:2005.1±633.5 § F(3,144)=3.3,p=0.02
Post hoc test ANOVA
• ANOVA tells you if there is a difference between means, but not specifically which means differ
o Conduct multiple comparisons post-hoc test to determine which means differ
What are the post hoc tests that ANOVA offers
o Tukey’s Test
o Least Squared Difference
o Dunnett’s test for between group comparisons
One-way ANOVA
considers only one independent variable (factor) for independent groups
Repeated measures one-way ANOVA
one-way ANOVA for related groups
Multivariate ANOVA (MANOVA)
ANOVA with several dependent variables
Factorial ANOVA
Compares means across two or more independent variables (factors)
What are the tests that can be done for Analysis of Categorical Variables
• Chi-square test of independence
o Tests whether there is a significant association between two or more categorical variables
• Fisher’s Exact Test
o Use when 20% or more of the cells have <5 counts of data
• Test for Trend
o More powerful for ordinal data
Chi-square test of independence
Tests whether there is a significant association between two or more categorical variables
• Utilizes contingency tables (also referred to as cross-tabulation, crosstab, or two-way table) o 2 x 2 table when each variable has 2 groups
o 2 x k table when each variable has k groups
• Assesses goodness of fit between observed values and theoretically expected values
• Df = k -1 (k=number of groups-columns)
Fisher’s Exact Test
Use when 20% or more of the cells have <5 counts of data
Test for Trend
More powerful for ordinal data
• Referred to as Linear-by-Linear association in SPSS
o Tests for trends in contingency tables larger than 2x2
• Takes into account ordered nature of data
o Relates to odds rather than variances
o Assumes that a change in ranks makes no difference to the odds of the outcome (i.e. Odds Ratio = 1)
o Therefore, df = 1
What are the assumptions of Chi-squared test?
o Variable are ordinal or nominal
o Groups/categories are independent
§Use McNemar test for related groups (if the observation are from the same people)