Relationships between variables 1 - Logistic Regression Flashcards

Question 1

Q

Define Sensitivity

Answer

A

The proportion of individuals with the disease who are correctly diagnosed by the test.

The percentage of cases that had the observed characteristic (e.g., “yes” for obese) which were correctly predicted by the model (i.e., true positives).

Question 2

Q

Sensitivity calculation

Answer

A

Sensitivity: true positives / true positives + false negatives

Question 3

Q

State reasons why researchers investigate relationships between variables

Answer

A

Discover new associations. / Explain variance. / Compare relative influence of factors on outcomes. / Adjust analyses for potential confounders. /Develop prediction models.

Question 4

Q

Describe the purpose, use, strengths and limitations of scatterplots

Answer

A

Scatterplots - Test bivariate relationships
Purpose: visual representation of the relationship between two quantitative variables
Use: Identify patterns, trends and potential outliers
Strengths: Simple, intuitive, and provides a clear visual indication of relationships.
Limitations: Does not provide a quantitative measure of the strength or direction of the relationship.

Question 5

Q

Describe the purpose, use, strengths and limitations of correlation

Answer

A

Purpose: Quantify the strength and direction of the relationship between two variables.
Use: Calculate correlation coefficient (Pearson’s r for linear relationships).
Strengths: Provides a single number that summarizes the relationship; easy to interpret.
Limitations: Only measures linear relationships; sensitive to outliers; does not imply causation.

Question 6

Q

Describe the purpose, use, strengths and limitations of regression

Answer

A

Purpose: Model the relationship between one dependent variable and one or more independent variables.
Use: Predict values, assess the impact of variables, and understand relationships.
Strengths: Provides a detailed understanding of relationships; can handle multiple predictors; provides estimates of effect size.
Limitations: Assumes a specific form (linear, logistic, etc.); can be sensitive to outliers; requires more complex interpretation.

Question 7

Q

Correlation vs Regression

Answer

A

Correlation summarises the strength and direction of relationship between two variables as a single value (r correlation coefficient).

Whereas Regression is a model that uses one variable as a predictor (x) and the other as a response (y) to model the relationship.

Regression finds an equation that best describes the relationship between the two variables.

While regression allows one variable to be predicted from the other, Correlation doesn’t allow predictions of one variable from the other.

The null hypothesis of correlation is ‘there is no (linear) relationship between variables’.

Whereas the null hypothesis of regression is ‘coefficients associated with variables = 0’.

Question 8

Q

State the assumptions of linear regression

Answer

A

Dependent variable/response - Continuous
Independent variable/predictor - Continuous or categorical
Relationships between variables - Bivariate relationship between predictor and response variables is linear
Residuals - the residuals of the regression line are approximately normally distributed
Normality - No outliers in data

Question 9

Q

State the assumptions of logistic regression

Answer

A

Dependent variable/response - Dichotomous (two outcomes) / binary or ordinal (ordered categories)
Independent variable/predictor - Continuous or nominal (mutually exclusive categories)
Relationships between variables - There shouldn’t be multicollinearity (several independent variables are correlated)
Residuals - N/A

Question 10

Q

Define Specificity

Answer

A

The percentage of cases that did not have the observed characteristic (e.g., “no” for not obese) and were also correctly predicted as not having the observed characteristic (i.e., true negatives).

Question 11

Q

Specificity calculation

Answer

A

Specificity: false positives / false positive + true negatives

Question 12

Q

Compare and contrast between correlation and regression

Answer

A

Correlation:

Summarises strength and direction of relationship between 2 variables as a single value
r = correlation coefficient

Correlation does not allow for the prediction of one variable from the other

Null hypothesis = No linear relationship between variables

Regression:

-Model
-Uses one variable as the predictor (x) and the other as the response (y)

Regression allows one variable to be predicted from the other

Null hypothesis = coefficients associated with variables = 0

Question 13

Q

What is linear regression

Answer

A

linear regression is a statistical model which estimates the linear relationship between a scalar response and one or more explanatory variables.

Fits a straight line to the data

y = a + bx + e

y = outcome variable
x = predictor variable

a and b are model parameters:

b is the slope of the line
a is the intercept of the line on the y-axis
where x = 0
e = ‘residual error’

Question 14

Q

State the assumptions for logistic regression.
Dependent variable/response
Independent variable/predictor
Relationships between variables
Residuals
Normality

Answer

A

Dependent variable/response - Dichotomous (two outcomes) / binary or ordinal (ordered categories)
Independent variable/predictor - Continuous or nominal (mutually exclusive categories)
Relationships between variables - There shouldn’t be multicollinearity (several independent variables are correlated)
Residuals - N/A
Normality - Not considered

Question 15

Q

State the assumptions for linear regression
Dependent variable/response
Independent variable/predictor
Relationships between variables
Residuals
Normality

Answer

A

Dependent variable/response - Continuous
Independent variable/predictor - Continuous or categorical
Relationships between variables - Bivariate relationship between predictor and response variables is linear
Residuals - the residuals of the regression line are approximately normally distributed
Normality - No outliers in data

Question 16

Q

Explain why researchers investigate relationships between variables.

Answer

Study These Flashcards

A

Discover new associations. / Explain variance. / Compare relative influence of factors on outcomes. / Adjust analyses for potential confounders. /Develop prediction models.

Question 17

Q

The theory underpinning logistic regression

Answer

Study These Flashcards

A

Predicting true or false: e.g., obese or not obese (measures categories)
Sigmoid curve shape – curve will be shifted to find the best fit using thresholds.
Want to see how the data fits onto a line to determine how good the model is good at predicting; this is done through a log odds scale. Log(probability) /log(1-probability).
Rotate until you have the best line of fit – SPSS.

Question 18

Q

Define Risk

Answer

Study These Flashcards

A

Absolute risk: probability of an event occurring in a sample/population.
Number of people with event / total number of people

Question 19

Q

Define Odds

Answer

Study These Flashcards

A

Chance of an event occurring vs not occurring in a population.
Number of people with event/number of people without event

Question 20

Q

What is a ROC Curve

Answer

Study These Flashcards

A

A receiver operating characteristic curve (ROC), is a graphical plot that illustrates the performance of a binary classifier model at varying threshold values. The ROC curve is the plot of the true positive rate against the false positive rate at each threshold setting.

(ROC curve: plot of binary classifiers, predicting classification using a threshold).

Question 21

Q

Describe situations where investigators may wish to investigate relationships between multiple variables

Answer

Study These Flashcards

A

To explore what contributes to an event, such as disease or death.

Question 22

Q

the theory underpinning logistic regression

Answer

Study These Flashcards

A

Logistic Function: The logistic regression model uses the logistic function to model the probability of the dependent variable. The logistic function is an S-shaped curve that maps any real-valued number into the (0, 1) interval, making it suitable for binary outcomes.
Odds and Log-Odds: Logistic regression models the log-odds (logarithm of the odds) of the probability of the dependent event occurring. This transforms the relationship into a linear one that can be estimated using standard regression techniques.
Maximum Likelihood Estimation (MLE): Logistic regression typically uses MLE to estimate the model parameters. MLE finds the parameter values that maximize the likelihood of observing the given data.

Relationships between variables 1 - Logistic Regression Flashcards

(22 cards)