Introduction to biostatistics Flashcards
When is Spearman’s rank correlation suitable?
It’s suitable if
- there is a non-linear relationship
- there are outliers
- variables are on an ordinal scale (e.g. economic status: low, medium, and high)
- the sample size is small
- x or/and y are not normally distributed
Assumptions for a chi-squared test
“Chi-squared test is a way to compare whether the variation in the data is due to chance or whether it is due to one of the variables we are actually testing”
- two variables are ordinal or nominal (i.e. categorical data)
- two or more independent groups
- for a 2x2 test all expected values must be at least 5
Parameters defining normal distribution
mean (µ) and standard deviation (sigma)
Assumptions when using a one-way ANOVA
- normally distributed data in each group
- independence of observations
- variances (SD) are equal in all groups
What can one use for the analysis of categorical data?
The chi-squared test (Pearson’s chi-squared test) to test whether there is an association between two categorical variables
Some parametric methods
- pearson correlation
- T-test
- ANOVA
- linear regression
What is numerical data?
Numerical data is quantitative data (actual values as data) and consists of discrete and continous data
You have numerical data of more than two groups. Which parametric and non-parametric test do you use?
parametric: One-way ANOVA4
non-parametric: Kruskal-Wallis
Type 2 error
H0 is not rejected when it is false
“false non-significant results”
–> happen often when the sample size is too small
What is continous data?
Continuous data describes data that can take any given value Example: BMI, height, weight
You have numerical data of two unrelated groups. Which parametric and non-parametric test do you chose?
parametric: unpaired t-test (compares the means) non-parametric: Mann-Whitney U test
What is categorical data?
Categorical data is qualitative data and consists of nominal and ordinal data
Assumptions when using a Kruskal-Wallis test
- independence of observations
- can be used when your variables are not normally distributed
Different types of correlation analysis
- Pearson correlation
- Spearman’s rank correlation (non-parametric)
SPSS
look at the handout if you think that’s important. I don’t ;)
What is nominal data?
Nominal data is data that is not specially ordered and no comparisons about better/worse can be made. Examples: marital status, gender, blood group, name, …
How do you check normality?
1) Histogram, Q-Q plot, mean and meadian are close to each other (then it is normally distributed)
2) Kolmogorov-Smirnov test and Shapiro-Wilk test (cave!! p has to be higher than 0.05 in order to say that data is normally distributed!)
Assumptions when using an unpaired t-test
- independence of observations
- normally distributed data in each group
- variances (SD) are equal in both groups
Normal distribution
also called Guassian distribution or bell curve
Some non-parametric methods
- Spearman’s Rank Correlation
- Mann-Whitney test
- Kruskal-Wallis test
- Wilcoxon test
Alternative hypothesis
assumes an effect (“there is difference/association”)
Type 1 error
H0 is rejected when it is true
“false significant results”
What is discrete data?
Discrete data describes the number of different events and is a whole number. Example: number of pregnancies
Assumptions when using a Mann-Whitney u test
- independence of observations
- can be used when your two variables are not normally distributed
Linear regression analysis
to predict the value of a variable based on the value of another variable
(a lot of “mathematical” expressions; look at the handout)
Null hypothesis
assumes no effect (“There is no difference/association”)
What is ordinal data?
Ordinal data is data that indicates different levels and can be coded by 0-n in order to rank the data Examples: disease stage, education level
What are the null- and the alternative hypotheses for the chi-squared test?
- H null: there is no difference between the observed and expected frequencies
- H1: there is a difference between the observed and the expected frequencies