Relationships between variables 1 - Logistic Regression Flashcards

1
Q

Define Sensitivity

A

The proportion of individuals with the disease who are correctly diagnosed by the test.

The percentage of cases that had the observed characteristic (e.g., “yes” for obese) which were correctly predicted by the model (i.e., true positives).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sensitivity calculation

A

Sensitivity: true positives / true positives + false negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

State reasons why researchers investigate relationships between variables

A

Discover new associations. / Explain variance. / Compare relative influence of factors on outcomes. / Adjust analyses for potential confounders. /Develop prediction models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe the purpose, use, strengths and limitations of scatterplots

A

Scatterplots - Test bivariate relationships
Purpose: visual representation of the relationship between two quantitative variables
Use: Identify patterns, trends and potential outliers
Strengths: Simple, intuitive, and provides a clear visual indication of relationships.
Limitations: Does not provide a quantitative measure of the strength or direction of the relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the purpose, use, strengths and limitations of correlation

A

Purpose: Quantify the strength and direction of the relationship between two variables.
Use: Calculate correlation coefficient (Pearson’s r for linear relationships).
Strengths: Provides a single number that summarizes the relationship; easy to interpret.
Limitations: Only measures linear relationships; sensitive to outliers; does not imply causation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the purpose, use, strengths and limitations of regression

A

Purpose: Model the relationship between one dependent variable and one or more independent variables.
Use: Predict values, assess the impact of variables, and understand relationships.
Strengths: Provides a detailed understanding of relationships; can handle multiple predictors; provides estimates of effect size.
Limitations: Assumes a specific form (linear, logistic, etc.); can be sensitive to outliers; requires more complex interpretation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Correlation vs Regression

A

Correlation summarises the strength and direction of relationship between two variables as a single value (r correlation coefficient).

Whereas Regression is a model that uses one variable as a predictor (x) and the other as a response (y) to model the relationship.

Regression finds an equation that best describes the relationship between the two variables.

While regression allows one variable to be predicted from the other, Correlation doesn’t allow predictions of one variable from the other.

The null hypothesis of correlation is ‘there is no (linear) relationship between variables’.

Whereas the null hypothesis of regression is ‘coefficients associated with variables = 0’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

State the assumptions of linear regression

A

Dependent variable/response - Continuous
Independent variable/predictor - Continuous or categorical
Relationships between variables - Bivariate relationship between predictor and response variables is linear
Residuals - the residuals of the regression line are approximately normally distributed
Normality - No outliers in data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

State the assumptions of logistic regression

A

Dependent variable/response - Dichotomous (two outcomes) / binary or ordinal (ordered categories)
Independent variable/predictor - Continuous or nominal (mutually exclusive categories)
Relationships between variables - There shouldn’t be multicollinearity (several independent variables are correlated)
Residuals - N/A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define Specificity

A

The percentage of cases that did not have the observed characteristic (e.g., “no” for not obese) and were also correctly predicted as not having the observed characteristic (i.e., true negatives).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Specificity calculation

A

Specificity: false positives / false positive + true negatives

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Compare and contrast between correlation and regression

A

Correlation:

  • Summarises strength and direction of relationship between 2 variables as a single value
    r = correlation coefficient

Correlation does not allow for the prediction of one variable from the other

Null hypothesis = No linear relationship between variables

Regression:

-Model
-Uses one variable as the predictor (x) and the other as the response (y)

Regression allows one variable to be predicted from the other

Null hypothesis = coefficients associated with variables = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is linear regression

A

linear regression is a statistical model which estimates the linear relationship between a scalar response and one or more explanatory variables.

Fits a straight line to the data

y = a + bx + e

y = outcome variable
x = predictor variable

a and b are model parameters:

b is the slope of the line
a is the intercept of the line on the y-axis
where x = 0
e = ‘residual error’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

State the assumptions for logistic regression.
Dependent variable/response
Independent variable/predictor
Relationships between variables
Residuals
Normality

A

Dependent variable/response - Dichotomous (two outcomes) / binary or ordinal (ordered categories)
Independent variable/predictor - Continuous or nominal (mutually exclusive categories)
Relationships between variables - There shouldn’t be multicollinearity (several independent variables are correlated)
Residuals - N/A
Normality - Not considered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

State the assumptions for linear regression
Dependent variable/response
Independent variable/predictor
Relationships between variables
Residuals
Normality

A

Dependent variable/response - Continuous
Independent variable/predictor - Continuous or categorical
Relationships between variables - Bivariate relationship between predictor and response variables is linear
Residuals - the residuals of the regression line are approximately normally distributed
Normality - No outliers in data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain why researchers investigate relationships between variables.

A

Discover new associations. / Explain variance. / Compare relative influence of factors on outcomes. / Adjust analyses for potential confounders. /Develop prediction models.

17
Q

The theory underpinning logistic regression

A

Predicting true or false: e.g., obese or not obese (measures categories)
Sigmoid curve shape – curve will be shifted to find the best fit using thresholds.
Want to see how the data fits onto a line to determine how good the model is good at predicting; this is done through a log odds scale. Log(probability) /log(1-probability).
Rotate until you have the best line of fit – SPSS.

18
Q

Define Risk

A

Absolute risk: probability of an event occurring in a sample/population.
Number of people with event / total number of people

19
Q

Define Odds

A

Chance of an event occurring vs not occurring in a population.
Number of people with event/number of people without event

20
Q

What is a ROC Curve

A

A receiver operating characteristic curve (ROC), is a graphical plot that illustrates the performance of a binary classifier model at varying threshold values. The ROC curve is the plot of the true positive rate against the false positive rate at each threshold setting.

(ROC curve: plot of binary classifiers, predicting classification using a threshold).

21
Q

Describe situations where investigators may wish to investigate relationships between multiple variables

A

To explore what contributes to an event, such as disease or death.

22
Q

the theory underpinning logistic regression

A

Logistic Function: The logistic regression model uses the logistic function to model the probability of the dependent variable. The logistic function is an S-shaped curve that maps any real-valued number into the (0, 1) interval, making it suitable for binary outcomes.
Odds and Log-Odds: Logistic regression models the log-odds (logarithm of the odds) of the probability of the dependent event occurring. This transforms the relationship into a linear one that can be estimated using standard regression techniques.
Maximum Likelihood Estimation (MLE): Logistic regression typically uses MLE to estimate the model parameters. MLE finds the parameter values that maximize the likelihood of observing the given data.