Regression and Correlation Flashcards
Correlation
- degree to which two quantitative variables are related
- does not suggest causation
Pearson’s correlation coefficient
- commonly used measure for quantitative parametric data
- correlation ranges from -1 to +1
- no units
Regression
-helps predict what the next number is going to be in correlated values
Regression co-efficient
- y= a+bx
- b is the regression coefficient and a is the intercept on the y axis
Fisher’s transformation
-may be used to compare two correlation coefficients for hypothesis testing
Partial correlations
-correlations between two variables after adjusting for a third variable
Spearman’s correlation
(rho)
- non-parametric equivalent of pearsons
- used to test the association of variables if at least one is ordinal (ranked)
- assumes ranks are equidistant
- if this is not true then Kendall’s tau will be used
How to calculate the value of a and b in the regression calulation
- done using a scatter gram and ‘method of least squares’
- lines drawn from dots on the scattergram back to the line of good fit
- these distances are called residue
Multiple linear regression
- several independent variables together predict a single dependent variable
- multivarate technique
- the independent variables are called covariates
Collinearity
- when two covariates studied may be highly correlated with each other
- may disturb regression
R2
- square of regression coefficient
- also called the coefficient of determination
- used to test goodness of fit or final regression
- it is the proportion of total variation in the dependent variable that can be explained by the independent variable
- measures how well the dependent variable and calculated dependent variable correspond to each other
- ranges from 0 to 1
Linear regression
-dependent variable must be continuous
Logistic regression
-used if the dependent variable is binary
Log-linear analysis
-accommodates only categorical data
Bernoulli random variables
-variables that have dichotomous outcomes used in the logistic regression
Exponential correlation
- used if one wants to demonstrate the exponential relationship of a variable with a factor such as time
- log transformed values can be plotted against time
Polynomial regression
- in some cases of non-linearity, the relationship between dependent variable y and independent variable x could be expressed as Y=Xn, where n may be 2,3,4 etc
- this is polynomial regresion
1 in 10 rule
- the number of varibles studied in multiple regression models must not be greater than 10% of sample size
- for logistic regression the number of variable must not be greater than 10% of number of events
Stepwise regression
- calculates coefficient of regression and starts with most significant to least significant independent variable and fits them in a stepwise fashion into the regression equation
- some statistically significant variables may not be clinically relevant
Forward selection
-confounding variable is treated as covariates
Backward elimination
-starts with the full equation and tries to discard covariates one by one according to changes that occur in correlation coefficients
Y=a+bX+e
- regression equation
- Y= the dependent variable
- a and b are constants, b is the regression coefficient
- X is the independent variable
- E is the error (random variable with mean of 0 ?!)
Key point
-using method of least squares we can find the best linear regression equation with minimum variance of ‘e’