Relationships between variables Flashcards

1
Q

What is a relationship between 2 continuous variables called?

A

Bivariate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do you do investigate bivariate relationships

A
  • Are the 2 variables associated – SCATTER PLOT
  • Enable the value of one variable to be predicted from any known value of the other – REGRESSION
  • Look for agreement between two variables – 2 different methods used to measure same thing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Scatter plots

A
  • Graph describes relationship between 2 variables
  • Independent variable on X axis – (causes a change)
  • Dependent variable on Y axis –(outcome variable)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Correlation assumptions

A
  • All values must be independent – e.g. cant correlate repeat measurements over time
  • Sample must be random from population – e.g. cant select specific individuals for inclusion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What data distribution do you need for correlations

A
  • You can calculate coefficient for any 2 continuous variables
  • Pearson correlation = both variables should be normally distributed
  • If not – then transform data
  • Spearman’s rank correlation – where variable distributions cannot be normalised by transformation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Hypothesis testing for correlations

A
  • Pearsons correlation – null hypothesis – NO LINEAR association between the 2 variables
  • Spearman’s rank correlation – null hypothesis – no association between 2 variables
  • BEWARE MULTIPLE CORRELATIONS – data dredging is when you do lots of correlations = for every 20 correlation tests one will show association by chance – so you will need to adjust to account for this– you could reduce the p value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is residual value?

A

Difference between actual value and fitted value on line

Line is fitted to minimise residual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Linear regression

A
The equation of a straight line is
y=a+bx
In this model:
	y is our response variable (weight)
	x is our predictor variable (age)
and
	a and b are model parameters
	b is the slope of the line
	a is the intercept of the line on the y-axis where x=0 

For each of our data points, there is an additional term for this equation to complete our model:
y=a+bx+e
Here, this ‘e’ term is known as the ‘residual error’ (the red dotted line).

Linear regression model y=a+bx+e
Values for a and b are calculated to minimize the total e (minimize ∑▒e)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Linear regression assumptions

A
  • Bivariate relationship between predictor and response variable is linear
  • The RESIDUALS are independent of each other and have normal distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Linear regression – hypothesis testing

A
  • Null hypothesis that b = 0

* No slope so no relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

correlation vs regression

A

C - Summarises strength and direction of relationship between 2 variables as a single value
R = correlation coefficient

R - Model
Uses one variable as the predictor x and the other as response y
Finds an equation that best describes the relationship between 2 variables

C - Doesn’t allow prediction of one variable from other

R - Allows one variable to be predicted from the other

C - Null hypothesis - no linear relationship between variables

R - Null hypothesis = coefficients associated with variables = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Correlation and causation

A

Correlation and regression show a link BUT don’t explain reason for the link

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Adjusting a correlation for one other variable

A

For example, you have 3 variables: age, number of medicines and number of drugs – certain factors could be influencing certain things – so what can you do?

Partial correlation coefficient
• Estimated correlation between 2 variables assuming that the 3rd variable is the same
• Partial correlation between age and number of medicines adjusting for (measure of comorbidity)
• If correlation remains after adjustment for 3rd variable this indicated that the association is independent of third variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Investigating relationships between multiple variables – some approaches

A

Hypothesis testing
• Null hypothesis – multiple regression
• Bayesian approach – model selection based on prior probabilities

Data reduction
• e.g. PCA

Hypothesis free e.g.
• Data mining – extracting and discovering patterns in large data sets
• Artificial intelligence, machine learning
• network mapping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Multiple regression analysis

A

Regression model
• One continuous dependent outcome variable described by multiple predictor variables

What can it do?
• Find relationship between variables without prior expectation
• Identify independent relationships adjusted for confounders
• Develop a prognostic tool for predicting a dependent variable of interest

Linear regression is used to predict the continuous dependent variable using a given set of independent variables
Logistic Regression is used to predict the categorical dependent variable using a given set of independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Risk vs odds

A

RISK
Absolute risk – probability of an event occurring in a population
Calculated as number of people with event/total number of people
Relative risk ratio – probability of event occurring in one group compared to another
Absolute risk 1/absolute risk 2
Easier to explain and understand

ODDS
Chance of an event occurring vs not occurring in a population
Calculated as number of people with event /number of people with no event
Odds ratio – chance of an event occurring between 2 groups
Odds group 1/odds group 2
Needed for more complex statistical analysis
e.g., fitting statistical models to investigate how covariates and predictors influence the chance of an event occurring

17
Q

Derivation vs validation

A

derivation = Depends on the available dataset and its quirks

validation = Checks that the model works/is generalisable

Internal validity – split one dataset into derivation/validation cohorts – reduces power and doesn’t provide external validity (reduced power means smaller sample size so harder chance of detecting true effect)

External validity – check applicability of model in diff dataset/cohort

18
Q

sensitivity

A

ability of test to correctly identify patients with a disease

true positive/all positive

19
Q

specificity

A

ability of test to correctly identify people without a disease

true negative/all negative outcome

20
Q

true negative

A

true negative/ all negative predictions

21
Q

true positive

A

true positive/all positive predictions

22
Q

ROC curve interpretation

A

It also shows the ROC curve (the closer this is to the top left hand corner the better the score predicts the outcome). Area under the curve is 0.724, which means our score has a 72.4% chance that the prediction score will be able to distinguish between a patient likely to die and one likely to survive (a score of 0.5 is 50:50 ie useless, a score of 1 is perfect; 0.7-0.8 is generally considered acceptable)