25. Correlation, Regression, and Survival Analysis Flashcards
what is the statistic for a linear relation between two variables?
correlation coefficient (value is between -1 and +1)
- the closer the correlation coefficient is to +/-1, the stronger the linear relationship (less scatter)
- correlation coefficient of 0 indicates no linear relationship
- positive value means variables move in same direction
- negative value means variables move in opposite directions
- if excel gives an R^2 value for the correlation of data to the linear line, the correlation coefficient is actually R
- does NOT tell you what the relationship is - doesn’t tell you how much one variable changes when the other is chanced
- when the correlation coefficient is zero, two variables may still have some (non-linear) relationships
what is the overall use of regression analysis?
association between to variables and to control for confounding
regression quantifies what?
association between variables
regression adjusts for what?
confounding variables (if measured)
two main goals of regression analysis?
- causal analysis: identification of risk factors “causation” (in the presence of /adjusting for other variables, beware that causality is often difficult or impossible to establish)
- predict outcome with independent variables
linear regression tests for what?
logistic regression tests for what?
cox proportional hazards regression tests for what?
linear regression tests for a linear relationship between 2+ variables (measurement data)
logistic regression tests for a relationship between various covariates and the proportion of cases with given characteristics (eg dichotomous/categorical data)
cox proportional hazards regression tests for the effect of covariates on the time to an event
simple linear regression
exactly the same as coefficient of correlation
- one independent variable
- relationship between X and Y is described by a linear function
how is the linear regression model fit?
least squares criterion
- the difference of the observed and predicted value of Y for each observed value of X is MINIMIZED
interpretation of slope and intercept for simple linear regression? statistical significance?
- slope: estimated change in Y, with one unit change in X (Beta 1)
- intercept is the estimated value of Y when X= 0
- statistical significance determined by HYPOTHESIS TESTING using a t-test for slope (Beta 1)
- null and alternative hypothesis: if Beta1 = 0, there is no linear relationship
- if Beta1 doesn’t equal 0, then a linear relationship does exist
if we have more than one variable in the regression model, the model gives you what?
the effect of changes in each variable, while it holds the others constant
multiple (or multivariate) linear regression
- relation between a continuous outcome variable and a set of independent variables
- the regression coefficient for each X represents the amount by which Y changes on average when X changes by 1 unit and all of the other X’s remain the same
logistic regression - when do you use it?
use to examine an association (and control for confounding) for dichotomous outcomes ( model using a logistic transformation)
censored data
includes loss to follow-up or not having presence of event observed by the end of the study (information is thus limited by the amount of time participants spent in the study)
person-time analysis (how to, and limitations)
- find the number of incidences / person-time = # recurrences per person-time (eg 3 incidences in 15 person-months = 0.2 recurrences per person-month)
- these are called INCIDENCE DENSITY RATES
- divide one rate by the other to obtain incidence rate ratio (IRR) (eg those eating more eggs have a 1,2 times rate of developing high cholesterol)..can get a P-value and CI on that.
- limitations: rate is constant over time, so the risk for seeing the outcome is the same for one person followed for 2 months as for two people followed for one month
time to event (SURVIVAL) analysis
- analyses that include the time to an event occurrence (outcome of interest is not always death)
- display the data using survival curves (plot the proportion of individuals at risk and free from outcome (survivors) at that time)
- analyze data and adjust for confounding (cox proportional hazards regression)