Regression & Correlation Flashcards
What does correlation refer to?
Degree to which two quantitative variables are related
What is commonly used to measure correlation in quantitative parametric data?
Pearsons correlation coefficient
What does value of correlation coefficient ‘r’ vary between?
-1 to +1
Units of ‘r’?
None
When is correlation coefficient not valid?
If data is not independent (paired)
When can Fishers transformation be used?
To compare two correlation coefficients for hypothesis testing
What are Partial correlations?
Correlations between two variables after adjusting for a third variable
What is Spearmans correlation (rho)?
Non-parametric equivalent of Pearsons
What can Spearmans be used for?
To test association between two variables if at least one is ordinal or
If sample size is small despite being continuous variables
or if non-linearity is suspected or
if non-normal distribution is noted for both variables
What does Spearmans assume?
Difference between each pair of ordinal variables is the same i.e. the ranks are equidistant
If the difference between each pair of ordinal variables is not the same, how can one calculate correlation?
Kendalls Tau - appropriate measure of nonparametric correlation
What does regression statistics help with?
Helps predict what value one variable will be if given a particular value of the other variable
Explain the formula for simple linear regression
y = a +bx
B = regression coefficient A = intercept on y axis
What can simple linear regression predict?
Probable score in Y axis from known score in X axis i.e. dependent variable can be predicted from value of independent variable
How does one determine the value of a and b for regression?
Using a scattergram and method of least squares
Explain the method of using a scattergram and method of least squares
Hypothetical straight line is constructed so that its vertical distance from various points of observations on a scattergram is kept to a minimum; this is called the residue.
The sum of the square of residues is kept to a minimum for a regression line of good fit
What happens in multiple linear regression?
Several independent variables together predict a single dependent variable
What type of technique is multiple regression?
Multivariate
What is the name of the independent variables in multiple regression?
Covariates
What is the name of covariates which may be highly correlated with each other?
Collinearity
Effect of collinearity?
May disturb the regression
When is regression coefficient useful?
Examine confounders
What can be used to express correlation coefficients?
CI
What does square of regression coefficient test?
Goodness of fit or final regression
What is goodness of fit?
Proportion of total variation in dependent variable that can be explained by the independent variable
What does goodness of fit measure?
How well actual outcome (dependent variable) and calculated dependent variable correspond
Values or goodness of fit
0-1
What is the coefficient of variation?
Goodness of fit calculated for Pearsons correlation coefficient
What is needed for linear regression?
Continuous dependent variable
What type of regression is used if dependent variable is binary?
Logistic regression
Why is logistic regression popular?
It can give OR, RR and hazard ratio for independent variables that affect the dependent variable
Test used if variable is continuous, 1 independent & dependent variable
Simple linear regression
Regression test used if continuous data with >1 independent variable and 1 dependent variable
Multiple linear regression
Regression test used if binary data, 1 independent variable and 1 dependent variable
Simple logistic regression
Test used if binary data, >1 independent variable and 1 dependent variable
Multiple logistic regression
What is log-linear analysis used for?
Categorical data
What must be noted in log-linear analysis?
No demarcation between the dependent and independent variable
What data can be used in logistic regression?
Continuous and categorical independent variables
What are Bernoulli random variables?
Variables that have dichotomous outcomes used in logistic regression
What is exponential correlation?
When one demonstrates the exponential relationship of a variable with a factor such as time using log-transformed values plotted against time
What is polynomial regression?
When in non-linearity, the relationship between dependent variable (y) and independent variable (x) is expressed as Y=X(n square)
What is the 1 in 10 rule?
Number of variables studied in multiple regression models must not be greater than 10% of sample size.
1 in 10 rule for logistic regression?
Number of variables must not be greater than 10% of number of events
How can multiple regression be performed
Stepwise regression
Forward selection
Backward elimination
What happens in stepwise regression?
Coefficient of regression calculated and starts with most significant to least significant independent variable and fits them in stepwise fashion into regression equation.
Disadvantage of stepwise regression
Sometimes statistically significant variables may not be clinically significant
Theory behind forward selection
Confounding factor is associated with both independent and dependent variable
If one does not know the confounding variable, they are treated as covariates
What does one often examine in multiple regression
Which is the confounding variable
What happens in forward selection?
While constructing multiple regression equations, if the regression coefficient of a previously added variable changes then either one of the covariates is a confounder; these are retained in the equation irrespective of statistical significance.
Latter added covariate is discarded if no change occurs in regression coefficient
What is backward elimination?
Starts with final model - full equation - and tries to discard covariates one by one according to changes that occur in correlation coefficients
In the equation y=a+bx+e what is y?
Dependent variable - outcome of interest
In the equation y=a+bx+e what is a and b?
Constants
In the equation y=a+bx+e what is b?
Slope or regression coefficient
In the equation y=a+bx+e what is x?
Independent variable - predictor of outcome
In the equation y=a+bx+e what is e?
Error.
Random variable with mean = 0
In the equation y=a+bx+e what does e represent?
Part of variability of Y which is not explained by relationship with x
What can method of least squares be used for with respect to e (error)?
We can find best linear regression equation with minim variance of e