Correlation and Regression Flashcards
What is the maximum correlation coefficient that is possible?
1.0
the datapoints are on a perfect line
What is the purpose of Correlation?
evaluate the relationship between two variables (x) and (y): f.e. GFR and age
-GFR in the future can be predicted with an equation
What can be described by the Correlation line?
-quantitatively describes the
-Strength (described by the correlation coefficient, how close to 1, how close are the data points to the line
-Direction: the positive or negative slope of the line
What is the term for the correlation coefficient?
Pearson product-moment coefficient of
correlation (r)
Which type of data are used to describe correlations?
-interval or ratio data
in other words: continuous data
Which type of data is used for Spearman rho (r2)
Different type of correlation coefficient
Ordinal data
-Ranked
General mantra
Correlation does not equal Causation
Which conditions are required to establish Causality?
-Controlled conditions
-Randmizedion (RCT), Placebo
-9 Bradford Hill criteria (discussed early in the semester):
biologic credibility of the association, logical time
sequence (cause precedes outcome), a
dose-response relationship, and
consistency of findings across several studies
Interpretation of Correlations: R-value
-0.25 indicates little to no relationship
-0.25-0.5 indicates a fair degree of relationship
-0.5-0.75 indicates a moderate to good relationship
-0.75-1.0 is considered good to excellent relationship
What is the Coefficient of Determination (r^2)
-indicates the percentage of the total variance in the Y scores, that can be explained by the X scores
-it is explanatory
f.e.: GFR on the y-axis and age on the X-axis -> r=0.9 so the r^2 is 0.81 -> 81% of the Y variable (GFR) can be explained by the changes of the X variable (age)
Linear VS Curvilinear
Coefficient r is a measure of linear relationships only
-curvilinear relationships, are not described accurately by the linear correlation coefficient
Interpretation of Correlation
-2 variables should not be interpreted solely based on the correlation coefficient r
-variables should be plotted and see whether a linear or curvilinear relationship exists and whether an r value is appropriate
-the assosiation of 0.4 is not twice as strong as 0.2
-the difference in association between 0.5 and 0.6 is not the same as in 0.8 and 0.9
Correlation Matrix
-analyzing several variables at one time
-presenting corellation coefficients for all pairs of variables
-each variable will be regressed with each one to see if any of the variables is related to one another
Significance of Correlation Coefficient
-the observed correlation is one of an infinite number of possible correlations -> obtained from a random sample of the population
-> subject to sampling error, or BIAS
->need to be tested for statistical SIGNIFICANCE
What would be the Null Hypothesis when determining the Correlation Coefficient?
H0 states that r=0 -> no correlation
What are the statistical tests used to reject the H0?
-Pearson product-moment coefficient of
correlation
-Spearman rho (r2)
-> to determine the p-value
-if the p-value is less than 0.05, we reject the 0 and state that 2 variables are correlated, DOESN’T mean that there is a STRONG correlation
Which factor affects the correlation coefficient?
-Sample Size
-CAUTION:
-with a large sample size statistical significance can be achieved even with a low r-value (0.2, p < 0.05)
Can a correlation be extrapolated to values outside the frame it was tested for?
No, it should be limited to the range of values used to produce the correlations
Independence of variables
For correlated variables to be considered valid they have to be independent of each other
-dont use related variables -> relatives (mom and children) -> they will give a high correlation value
Regression
used to predict the dependent variable (Y) based on the independent variable (X)
-f.e.: (Y) = GFR (X) = Age
-GFR depends on the age
What is linear Regression?
-Plotting variables -> creating the line between the data points
-provides an equation that helps to predict the (Y) variable
-Y = m*x +b; (X) = Age
What is non-linear Regression used for?
-Curvilinear relationships
-f.e. psychomotor ability and age
-low in kids, peaks at age 30, and declines
When to use a Multivariate Analysis
When more than one variable affects variable (Y)
-f.e. Effect of Age and salt intake on GFR
-Often used to develop predictive models of
risk: prediction of cardiac vascular risk based on BP, age, sex, lipids, etc.
-> Incorporate the effect of multiple variables on a single outcome
Which type of Regression is used for continuous outcome variables in Multiple Linear Regression?
Multiple Linear Regression
What is Logistical Regression used for?
-dichotomous outcomes (variables)
-YES or NO questions
-f.e. to predict the mortality of a patient admitted to the hospital -> Variables (Y): diagnosis, disease state, age, etc -> LIVE or DEATH
-another example: discharge from the hospital at day 3
How is Logistical Regression different from Multivariate Analysis?
-Logistical Regression includes multiple variables but the OUTCOME is dichotomous
QUESTION
How is the Multivariate Analysis different from the Correlation Matrix?