Lecture 7: Correlation and Validation of Dietary Intake Assessment Flashcards
Which parametric test do I run if I have two continuous variables?
Pearson correlation
Which nonparametric test do I run if I have two continuous variables?
Spearman correlation
Which parametric test do I run if I have more than two continuous variables?
Logistic regression
Which nonparametric test do I run if I have more than two continuous variables?
non-parametric regression
What does correlation measure?
Correlation measures the strength of a relationship between two continuous variables and also the direction of the relationship
- attempts to create a line of best fir through data of the two variables
What does a correlation r value indicate?
R value indicates how fat data points are from the line of best fit
What is the difference between r value and r squared value?
R value indicates how fat data points are from the line of best fit
R squared value is the coefficient of determination
- proportion of variance in the dependent variable that can be predicted form the independent variable
- how much variability is explained in your dependent variable by your independent variable so how much a variability the predictor explain your outcome.
What are the features of r value?
- can be between -1 and 1
- 0 represents no correlation
- +/- represents correlation
- negative value represents a negative correlation
What are the degrees of freedom in an r value of correlation?
Df = n-2
bc there are two variables
Which results are reported with correlation analysis?
R value and p value
What are residuals? How is it calculated?
observed value - predicted value
so whatever is missed by your line of best fit are the residuals
the line of best fit fits these points in a way that minimizes the overall residual value.
What are the assumptions of correlation analysis?
- variables are continuous
- variables are approximately normally distributed
- linear relationship between your two variables (scatterplot)
- no extreme outliers
How do outliers affect the correlation analysis?
The outliers will really skew correlation value.
If an outlier was included then, you have one coordinate that is really far off from everyone else and that is pulling your line of best fir away from majority of the data and makes your r value different that it should be.
So, if you remove outlier, the line of best fir sits better and the correlation becomes stronger.
How is the r value classified based on strength?
small (+/- 0.1-03)
medium (+/- 0.3-0.5)
large (+/- 0.5-1.0)
True/False
Slope is important in correlation analysis
CORRELATION DOES NOT REPRESENT SLOPE