Module 3 Flashcards
How do you summarise categorical data if using 1 variable
- frequencies, proportions or percentages
What test is used to compare the distribution of categorical variable to the hypothesised distribution?
chi squared goodness-of-fit test /one sample chi squared test
What is an example of a one-sample chi squared test hypothesis?
H0: is evenly distributed
H1: is not evenly distributed
Why is the one sample chi-squared test (chi squared goodness-of-fit test) used?
- to quantify the discrepancy between the expected and observed frequencies
e. g. between sample and hypothesised value
What is the shape of the chi-squared distribution?
- non-symmetric (always positive)
- changes with the df (degrees of freedom)
What is degree of freedom?
= number of groups -1
- indicates how many of the data points are ‘flexible’
What test is used to look at association between 2 categorical variables?
- chi-squared test of independence
How is the test statistic calculated for chi-squared test of independence?
- same as normal except need to calculate for each cell in the table
e. g. (column total x row total)/overall total
How do you calculate the df for a contingency table/cross-table?
df = (number of rows - 1) x (number of columns - 1)
What does a small x^2 value mean?
- when the observed value is approximately equal to the expected value in each cell
- only vary due to sample variability
What causes a large x^2 value?
- sample variability (given by p-value)
- Null hypothesis is not true
What are the chi-squared test of independence assumptions?
- the observational units are independent
- the expected cell counts should be >5
What are the limitations of x^2 test of independence?
- not informative about how variables are related
- only really be used for bivariate analysis
What are other options for assessing associations in categorical variables?
- relative risk
- odds ratio
Can chi-squared test of independence be used for before and after?
- no because the measurement is on the same individual
What is a McNemar’s test?
- used for 2x2 tables to test repeated measurments on the same variable
SIMPLE CONCEPT:
- if no change, participants stay on diagnoal
- if change, participants move off the diagonal
What test is used for continuous data?
one sample t-test
What is a t-test?
- parametric test used for testing differences in means
- tests the hypothesis that the means of a sample is equal to a fixed value
What does the one sample t-test assume?
- data is normal distributed
What is the test statistic equation for a one-sample t-test?
t = (sample mean - expected value)/ (sample sd/ square root of the sample size)
what influence the fatness of the t-test distribution tail?
- degrees of freedom
What makes a t-distribution more normally distributed?
- bigger sample size/more df
What is the t-test distribution if n>30?
- sampling distribution of means is approximately normally distributed
When is a two sample/independent sample t-test used?
compare two groups
- dependent is continuous
- independent is categorical
What are the assumptions for a two-sample t-test?
- distribution is normally distributed or >30
- results come from two independent samples
- variances in the two groups are the same
what test is used if it is unknown if the sample is normal?
non-parametric test (mann-whitney test)
What does a mann-whitney test/Wilcoxon Rank sum test compare?
- medians of two samples
When is a paired t-test used?
- before and after
- left and right arm
What is a ANOVA t-test?
- one way analysis of variance
- used when >2 groups
What is the ANOVA hypotheses?
Null = means are the same H1 = at least one mean differs
What are the two types of variation within data for ANOVA?
- between groups
- within group
How can you tell if the variation is between groups?
- distributions are at different levels of the x-axis
How do you calculate total variation?
sstotal (sum of squares) = sum of (mean - overall mean)^2
What does the conversation of SS to MS (mean square) for?
- account for different df in each calculation
What is the total variance equation?
MStotal = SStotal/(N-1)
What is the between group variance equation?
MSgroups = SSgroups/(k-1)
What is the within group variance equation?
MSerror = SSerror/(N-k)
What does the ratio of MSgroups/MSerror show?
- how much bigger groups effect is compared to random noise
What is the variance ratio?
- ratio of two variances
- denoted F (f is the test statistic for ANOVA)
When is a post-hoc test used?
- if the H0 is rejected in an ANOVA to determine which means are different
What is the most common post hoc test?
- tukey
What is the steps for interpreting ANOVA results?
- check ANOVA assumptions
- conduct ANOVA
- if p-value>0.5 then do not reject H0)
- if p-value<0.5 then reject and do post-hoc testing
What test is used for normality assumption (ANOVA)?
- non-parametric test such as kruskal-wallis test
What test is used for equal variances assumption?
- levene’s test to test the H0 that variances of groups is the same
- if test is significant the variances are not equal
what test statistics equation is used for one-sample chi squared test?
= observed - expected/precision
What type of test is a t-test?
- parametric test
What is an assumption of a parametric test?
- assumes the data follows a known distribution
What does a one-sample t-test test?
- the hypothesis that the mean of a sample is equal to a fixed value
What t-test is used to decide if variances are equal?
Levene’s test
What is an advantage of paired test?
- takes out the variation between patients and only the effect of a drug
What are the assumptions for ANOVA?
- normally distributed or >30
- equal variances
- independence among observations
What number of type 1 error is achieved after all post-hoc tests?
- 0.05
Which of the following are assumptions of the Chi-square test of independence?
- There are no assumptions for this test
- Expected cell counts ≥5
- Data are normally distributed
- Observations are independent
Cell counts ≥5
- Expected cell counts ≥5
- Observations are independent
If I conduct a Chi-square Test of Independence on a 3x4 contingency table and discover I have several cells with expected counts < 5 what should I do?
- Remove the troublesome categories and repeat the analysis.
- Continue with the analysis and report the results.
- Try using another statistical test.
- Review the expected counts for each cell to identify the problem categories and then try to combine them with another category if it is sensible to do so.
Review the expected counts for each cell to identify the problem categories and then try to combine them with another category if it is sensible to do so.
True or false? I would use a Chi-square Test of Independence if I want to look for an association between two ordinal variables in my data set.
true
True or false? I would use a Chi-square Test of Independence if I want to look for an association between two continuous variables in my data set.
false
How many degrees of freedom are there in a Chi-square analysis if I am comparing two variables, one with three categories and the other with four categories?
6
The degrees of freedom (df) are calculated using the formula (number of categories in variable 1 - 1) x (number of categories in variable 2 - 1). When expressed in a contingency table we can simplify this to the (number of rows - 1) x (number of columns - 1). In this question there were 3 categories in one variable and 4 in the other, so df = (3-1) x (4-1) = 6.
Which is the most appropriate summary of results for the following R Commander output?
Paired t-test
data: Baseline Triglyceride and Final Triglyceride levels
t = 1.200, df = 15, p-value = 0.249
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-10.915 39.040
sample estimates:
mean of the differences(Baseline - Final) 14.062
- There was a significant difference in the mean triglyceride level and the mean final triglyceride level (p>0.05).
- The mean triglyceride values were significantly higher than the mean final triglyceride levels (mean difference = 14.1 mmol/L, p > 0.05).
- The final triglyceride values were on average 14.1 mmol/L higher than the triglyceride levels (95% CI: -10.9 - 39.0 mmol/L, t15=1.20, p=0.249). This difference was not considered statistically significant.
- The triglyceride values were on average 14.1 mmol/L higher than the final triglyceride levels (95% CI: -10.9 - 39.0 mmol/L, t15=1.20, p=0.249). This difference was not considered statistically significant.
-The triglyceride values were on average 14.1 mmol/L higher than the final triglyceride levels (95% CI: -10.9 - 39.0 mmol/L, t15=1.20, p=0.249). This difference was not considered statistically significant.
You use an ANOVA to compare the means of three or more groups instead of doing multiple pairwise t-tests because …
- The ANOVA controls for the type I error, reducing the chances of incorrectly concluding there is a difference between some groups.
- my lecturer told me to.
- the ANOVA is testing something different to the t-test.
- it is easier to do one ANOVA in SPSS compared to running multiple t-tests.
- The ANOVA controls for the type I error, reducing the chances of incorrectly concluding there is a difference between some groups.
If the assumption for equal variances is violated in an ANOVA what should you do?
- Report the usual ANOVA F statistic
- Don’t worry about it because the ANOVA is robust to violations in the assumptions
- Use the Kruskal-Wallis test instead of the ANOVA
- Use the Welch F or Brown-Forsythe F test statistics
Use the Welch F or Brown-Forsythe F test statistics
If you obtained the following ANOVA output, would you conduct post-hoc tests?
Df Sum Sq Mean Sq F value Pr(>F)
Factor(Treatment) 3 90 30.00 0.364 0.55
Residuals 68 5600 82.35
Yes
No
No
The ANOVA output indicates there is no significant difference between the means of the groups being compared. Therefore we don’t need to do post-hoc tests.
True or false? I can use the results of the normality check done as part of my univariate descriptive statistics to check the normality assumption for a t-test.
False
How do you summarise categorical data if using 2 variables
Cross tabulation (contingency table)
How do you calculate the expected frequencies
column total x row total / overall total