Relationships between variables 1 - Logistic Regression Flashcards
Define Sensitivity
The proportion of individuals with the disease who are correctly diagnosed by the test.
The percentage of cases that had the observed characteristic (e.g., “yes” for obese) which were correctly predicted by the model (i.e., true positives).
Sensitivity calculation
Sensitivity: true positives / true positives + false negatives
State reasons why researchers investigate relationships between variables
Discover new associations. / Explain variance. / Compare relative influence of factors on outcomes. / Adjust analyses for potential confounders. /Develop prediction models.
Describe the purpose, use, strengths and limitations of scatterplots
Scatterplots - Test bivariate relationships
Purpose: visual representation of the relationship between two quantitative variables
Use: Identify patterns, trends and potential outliers
Strengths: Simple, intuitive, and provides a clear visual indication of relationships.
Limitations: Does not provide a quantitative measure of the strength or direction of the relationship.
Describe the purpose, use, strengths and limitations of correlation
Purpose: Quantify the strength and direction of the relationship between two variables.
Use: Calculate correlation coefficient (Pearson’s r for linear relationships).
Strengths: Provides a single number that summarizes the relationship; easy to interpret.
Limitations: Only measures linear relationships; sensitive to outliers; does not imply causation.
Describe the purpose, use, strengths and limitations of regression
Purpose: Model the relationship between one dependent variable and one or more independent variables.
Use: Predict values, assess the impact of variables, and understand relationships.
Strengths: Provides a detailed understanding of relationships; can handle multiple predictors; provides estimates of effect size.
Limitations: Assumes a specific form (linear, logistic, etc.); can be sensitive to outliers; requires more complex interpretation.
Correlation vs Regression
Correlation summarises the strength and direction of relationship between two variables as a single value (r correlation coefficient).
Whereas Regression is a model that uses one variable as a predictor (x) and the other as a response (y) to model the relationship.
Regression finds an equation that best describes the relationship between the two variables.
While regression allows one variable to be predicted from the other, Correlation doesn’t allow predictions of one variable from the other.
The null hypothesis of correlation is ‘there is no (linear) relationship between variables’.
Whereas the null hypothesis of regression is ‘coefficients associated with variables = 0’.
State the assumptions of linear regression
Dependent variable/response - Continuous
Independent variable/predictor - Continuous or categorical
Relationships between variables - Bivariate relationship between predictor and response variables is linear
Residuals - the residuals of the regression line are approximately normally distributed
Normality - No outliers in data
State the assumptions of logistic regression
Dependent variable/response - Dichotomous (two outcomes) / binary or ordinal (ordered categories)
Independent variable/predictor - Continuous or nominal (mutually exclusive categories)
Relationships between variables - There shouldn’t be multicollinearity (several independent variables are correlated)
Residuals - N/A
Define Specificity
The percentage of cases that did not have the observed characteristic (e.g., “no” for not obese) and were also correctly predicted as not having the observed characteristic (i.e., true negatives).
Specificity calculation
Specificity: false positives / false positive + true negatives
Compare and contrast between correlation and regression
Correlation:
- Summarises strength and direction of relationship between 2 variables as a single value
r = correlation coefficient
Correlation does not allow for the prediction of one variable from the other
Null hypothesis = No linear relationship between variables
Regression:
-Model
-Uses one variable as the predictor (x) and the other as the response (y)
Regression allows one variable to be predicted from the other
Null hypothesis = coefficients associated with variables = 0
What is linear regression
linear regression is a statistical model which estimates the linear relationship between a scalar response and one or more explanatory variables.
Fits a straight line to the data
y = a + bx + e
y = outcome variable
x = predictor variable
a and b are model parameters:
b is the slope of the line
a is the intercept of the line on the y-axis
where x = 0
e = ‘residual error’
State the assumptions for logistic regression.
Dependent variable/response
Independent variable/predictor
Relationships between variables
Residuals
Normality
Dependent variable/response - Dichotomous (two outcomes) / binary or ordinal (ordered categories)
Independent variable/predictor - Continuous or nominal (mutually exclusive categories)
Relationships between variables - There shouldn’t be multicollinearity (several independent variables are correlated)
Residuals - N/A
Normality - Not considered
State the assumptions for linear regression
Dependent variable/response
Independent variable/predictor
Relationships between variables
Residuals
Normality
Dependent variable/response - Continuous
Independent variable/predictor - Continuous or categorical
Relationships between variables - Bivariate relationship between predictor and response variables is linear
Residuals - the residuals of the regression line are approximately normally distributed
Normality - No outliers in data