2.6.2. Basic Biostatistics II Flashcards
What is a correlation?
A measure of the strength of the association between two variables; used to determine whether an “association” exists and quantify its strength
How do you prepare a plot of correlation coefficient? Relate this plot to the correlation coefficient.
- Observe two variables (X, Y) for each member of a random sample of n subjects. 2. Plot pairs of points (X1,, Y1), (X2,, Y2), …, (Xn, Yn) on a scatterplot. 3. Inspect scatterplot for patterns of association. 4. Estimate population correlation coefficient ρ by sample correlation coefficient r
What are the three extreme r values we use for correlation coefficient graphs and what do they tell us about our data?
r=0 => no linear association r=1 => perfect linear association r=-1 => perfect negative linear association
What is a regression?
Regression: A family of methods for relating a predictor or multiple predictors) to an outcome
“Determine whether an association exists and quantify its strength” This is an example of a correlation or regression?
Correlation
“Use the relationship to predict one variable from the other” This is an example of a correlation or regression?
Regression
Determine whether the observed relationship agrees with some theory or model and estimate the parameters of that model This is an example of a correlation or regression?
Regression
The most common way to measure linear association is by the use of what?
The correlation coefficient
Difference between p and r?
p is for the whole population. R is just the correlation coefficient for the sample
What is the formula for the sample correlation r?
How would you characterize the correlation of the variables below? Negative or positive? Weak or strong? Close to zero?
Good example of a zero correlation. Parabolas are always zero.
How would you characterize the correlation of the variables below? Negative or positive? Weak or strong? Close to zero?
Strong positive
How would you characterize the correlation of the variables below? Negative or positive? Weak or strong? Close to zero?
Weak positive correlation
How would you characterize the correlation of the variables below? Negative or positive? Weak or strong? Close to zero?
Rather strong negative. Not a perfect line, but pretty darn good.
How would you characterize the correlation of the variables below? Negative or positive? Weak or strong? Close to zero?
Weak negative or close to no correlation. There is a general negative slope but this would be an r of like -0.25 at best
How would you characterize the correlation of the variables below? Negative or positive? Weak or strong? Close to zero?
No correlation
How can we test if there is a significant correlation between variables X and Y? Go through the steps starting with already having the X and Y values.
- Compute r
- Compute t
You can use this to find your p-value in a confidence interval table. If p<0.05, you are significantly significant, which means you can reject the null hypothesis which states that there is no relation (meaning, your data supports a cause and effect)
There are two reasons we would use a pearson correlation. What are they?
- Observations are from a random sample
- At least one variable follows a normal distribution
What other tool can we use with regression to answer questions about the population?
Line of best fit
When do we use a simple logistic regression?
When the dependent variable is continuous/categorical
It estimates odds ratios (log odds)
When do we use a multiple logistic regression?
One dependent variable is categorical and we have multiple independent variables.
Using a correlation or linear regression
vs.
Using a logistic regression
To assess association between two CONTINUOUS variables, use correlation or linear regression
To assess association between CONTINUOUS predictor and CATEGORICAL (binary) outcome, use logistic regression
We are trying to see what affects BMD in women the most. We are testing weight and race (Black vs. White/other) against the BMDs. What type of stat method should we use?
Multiple regression
If we wanted to predict a binary outcome, such as finding a disease to be present or absent, or finding the prognosis to be died or survived, what statistical method would help us best?
Logistic Regression
Why do we need to adjust for certain variables?
Because they are confoudning