Comparing Relationships (Lecture 3) Flashcards
What is correlation?
Explanation/Measure of the strength of a relationship between continuous variables
- can be just dependent variables, don’t need to be independent
- both explanation of linear relationship AND magnitude/strength
How is regression similar to and different from correlation?
- It takes correlation one step further (helps predict 1 variable when given another variable)
- Recognizes and measures relationship (like correlation)
- Also describes relationship so an equation can be developed to predict 1 variable when given the other
What type of plot is used in conducting a correlation analysis?
Scatterplot
In a scatterplot, we are looking for the ____ that best estimates the varying data points.
“line of best fit”
Displays the relationship between two variables on a traditional Cartesian coordinate plane
scatterplot
Correlation does not verify ______, just indicates if there may be ______ between variables.
causation; association
The measure of the association between continuous variables
correlation
What is a positive correlation vs. a negative correlation?
Positive correlation - as one variable increases, the other variable increases
Negative correlation - as one variable increases, the other decreases
Divorce rates in Maine correlating with per capita consumption of margarine is an example of what?
Correlation-Causation Fallacy
What is the Pearson Correlation coefficient (r)?
a measure of the linear relationship between two variables
What does an r between 0 and +1 indicate?
positive correlation (as one variable increases, the second variable also increases)
What does an r between -1 and 0 indicate?
negative correlation
What does an r of 0 indicate?
no linear relationship
What values of “r” indicates a “perfect linear relationship?
r = -1 or r =1
What does an r = 0.8 indicate?
that many, but not all, of the variables are matched, meaning as one value is larger, the other does as well
r = 0.4 is a _____ correlation (stronger as it approaches +1)
weak positive
r = -0.4 is a ____ correlation (stronger as it approaches -1)
weak negative
T/F An r=0 means the two variables are unrelated or independent of one another
F - only that no linear relationship exists between the two variables
What would be the correlation “strengths” of the following values?
1. < 0.25
2. 0.26-0.5
3. 0.51-0.75
4. >0.75
- doubtful
- fair
- good
- superior
Lack of correlation could be due to either ______ or ______ in variable measures.
- low association (does not mean r=0)
- large errors
Strength of relationship is dependent on _____
data being evaluated
Spearman Rank Correlation Coefficient (rs) is used with ______ data, but not ____ data
ranked/ordinal; continuous
Value of rs varies between ___ and ___
1 and -1
T/F Continuous data can be transformed into ranks then used to calculate Spearman’s rs.
T - but it is not recommended (you would not want to decrease the power of the data by making continuous ordinal)
What test is used for spearman rank correlation coefficient/
nonparametric test
Spearman rs can be used when:
- continuous data are not available
- continuous data are skewed
What is the coefficient of determination?
r^2; how data fits along line of best fit
- the concept of a r^2 can be used to assess overall model fit:
- between 0 and 1
- determines the % of variability in the dependent endpoint that is due to the independent variable
What does an r^2 close to 0 imply?
poor model fit (coincides with lower r-value) - data on the scatterplot have no pattern and are not close to the trend line
What does r^2 close to 1 imply?
excellent model fit - data on the scatterplot have a precise pattern and are tightly clustered around the trend line
What does a study with an r=0.94 and an r^2=0.88 imply?
strong positive linear correlation and low variability (good model fit)
What are the assumptions of correlation?
- Independent observations
- same population
- No outliers
- assumes linear relationship
What are the limitations to correlation?
- Does not establish causality
- Chicken or the egg question
- Only tells us that an association is likely and how strong the association is (does NOT help us predict variable based on another - regression analysis)
Unlike correlation, regression requires _______
an independent variable (If I increase the independent variable by 1 unit, how much will the dependent variable change?)
Regression analysis attempts to predict or estimate the value of _____ from the known value of _____.
dependent variable; independent variable
What is a common concern regarding regression analysis?
bias
What needs to be done if there’s a concern for bias in regression analysis?
- Experiments have to be taken “as given” and the inherent biases identified
- Biases then can be removed statistically or averaged out
List and define the types of regression.
- Simple regression - 1 dependent variable and 1 independent variable
- Multiple regression - 1 dependent variable and 2+ dependent variables
- Linear regression - continuous dependent variable
- Logistic regression - dichotomous/discrete dependent variable
- simple/or multiple linear regression OR
- simple/or multiple logistic regression
Type of regression: 1 independent variable & 1 dependent variable that is continuous
simple linear regression
Type of regression: 2 independent variables & 1 dichotomous/discrete dependent variable
multiple logistic regression
Type of regression: 1 independent variable & 1 dichotomous/discrete dependent variable
simple logistic regression
Type of regression: 2 independent variables & 1 continuous dependent variable
multiple linear regression
_____ - the process of estimating the slope and intercept of a trend line through the “middle” of a scatterplot
Simple linear regression
What is the independent variable of a linear regression?
y-axis
What is the dependent variable of a linear regression?
x-axis
_____ - an extension of the linear regression model when more than one independent variable is considered
Multiple linear regression
_____ may be used to simply describe multiple independent variables the researchers are assessing.
Covariates
Multiple linear regression can be used to account for _____
confounders
How can multiple linear regression aid in removing bias?
If researcher collects all data on all possible confounders and incorporate these variables into the regression, they remove each and every possible source of bias, and now the (multiple regression produces unbiased and efficient estimates)
It is difficult to graphically depict a scatterplot with more than 3 dimensions. Hence, multiple regressions are typically expressed as ______ and the estimates presented in ____.
equations; tables
Rather than report with r or m, logistic regression typically reports as ___
Odds ratio (OR)
What is an odds ratio?
ratio measuring the strength of an association (logistic regression)
What does an odds ratio of <1 indicate? >1? =1?
< 1 indicates decreased events
> 1 indicates increased events
= 1 indicates no association
Studies may say they did simple logistic regression, and will present a(n) _____
Then they may mention they did a multiple logistic regression, and present you with a(n) ____.
OR; adjusted OR
Which is typically the more accurate representation of the “true” difference between groups: odds ratio/adjusted odds ratio?
adjusted odds ratio (specific to each study; specific differences between groups)
_____ - time to event analysis
survival analysis (I.e., time until death in years in elderly population, time until rejection (months) from heart transplant, etc.)
Survival analysis is also called ____ & ____
life events; survival plots
In what types of studies are survival analyses often applied to in medical settings?
studies of death (but can be any outcome of interest - injury, onset of illness, recovery from illness, transition around a clinical threshold (CD4 count), etc)
How is data collected for a survival analysis?
occurs in a follow-up period
Appropriate technique when you are following subjects over time
survival analysis
From start to finish, how is a survival analysis conducted?
- Subjects are enrolled at some well-defined point in time
- Data collection occurs in a follow-up period (prospective - follow pts to see if/when event of interest happens)
- Data collection stops on a subject because:
- subject experiences the event of interest
- the study ends
- subject leaves the study for other reasons
What statistical tests would you use for a survival analysis?
- Log-rank test - determines if there is a difference between the groups (type of chi-square test)
- Cox-proportional hazards regression - estimates hazard ratio
survival analysis methods which allow for censoring:
- Kaplan Meier method
- Cox-proportional hazards regression (hazard ratio instead of OR)
______ - the time from entry into a study until a subject has a particular outcome
time to event
____ - subjects are said to be ____ if they are lost to follow up, they drop out of the study, or if the study ends before they die or have an outcome of interest in a survival analysis.
Censoring; censored
How are censored subjects accounted for in a survival analysis?
They are counted as alive or disease free for the time they were enrolled in the study
What is the outcome of interest for survival analysis?
time (survival time) until event (failure)
Hazard ratio is a ____ ratio
rate
Can have a(n) _____ to describe the relationship between 2+ groups and survival time for more than one covariate
adjusted hazard ratio
What does a hazard ratio (HR) < 1 indicate? > 1? = 1?
- HR < 1 - survival probability is higher in the treatment group
- HR > 1 - survival probability is lower in the treatment group
- HR = 1 - no effect`
____ - provides estimates of the survival function/curve
Kaplan Meier method
What does it mean if the survival function is always higher for one group than another?
one group is “surviving” longer than the other
What does it mean if survival functions cross in Kaplan Meier method?
situation is unclear
What are the assumptions to survival analysis?
- random (or representative sample)
- independent subjects
- entry criteria are consistent
- endpoint defined consistently
- starting time clearly defined
- censoring unrelated to survival
- Average survival doesn’t change during the study
What is a caveat to the assumptions of survival analysis?
survival estimates can be unreliable toward the end of a study when there are small numbers of subjects at risk of having an event