Comparing Relationships (Lecture 3) Flashcards

1
Q

What is correlation?

A

Explanation/Measure of the strength of a relationship between continuous variables
- can be just dependent variables, don’t need to be independent
- both explanation of linear relationship AND magnitude/strength

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is regression similar to and different from correlation?

A
  • It takes correlation one step further (helps predict 1 variable when given another variable)
  • Recognizes and measures relationship (like correlation)
  • Also describes relationship so an equation can be developed to predict 1 variable when given the other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What type of plot is used in conducting a correlation analysis?

A

Scatterplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In a scatterplot, we are looking for the ____ that best estimates the varying data points.

A

“line of best fit”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Displays the relationship between two variables on a traditional Cartesian coordinate plane

A

scatterplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Correlation does not verify ______, just indicates if there may be ______ between variables.

A

causation; association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The measure of the association between continuous variables

A

correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a positive correlation vs. a negative correlation?

A

Positive correlation - as one variable increases, the other variable increases
Negative correlation - as one variable increases, the other decreases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Divorce rates in Maine correlating with per capita consumption of margarine is an example of what?

A

Correlation-Causation Fallacy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the Pearson Correlation coefficient (r)?

A

a measure of the linear relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does an r between 0 and +1 indicate?

A

positive correlation (as one variable increases, the second variable also increases)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does an r between -1 and 0 indicate?

A

negative correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does an r of 0 indicate?

A

no linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What values of “r” indicates a “perfect linear relationship?

A

r = -1 or r =1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does an r = 0.8 indicate?

A

that many, but not all, of the variables are matched, meaning as one value is larger, the other does as well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

r = 0.4 is a _____ correlation (stronger as it approaches +1)

A

weak positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

r = -0.4 is a ____ correlation (stronger as it approaches -1)

A

weak negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

T/F An r=0 means the two variables are unrelated or independent of one another

A

F - only that no linear relationship exists between the two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What would be the correlation “strengths” of the following values?
1. < 0.25
2. 0.26-0.5
3. 0.51-0.75
4. >0.75

A
  1. doubtful
  2. fair
  3. good
  4. superior
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Lack of correlation could be due to either ______ or ______ in variable measures.

A
  1. low association (does not mean r=0)
  2. large errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Strength of relationship is dependent on _____

A

data being evaluated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Spearman Rank Correlation Coefficient (rs) is used with ______ data, but not ____ data

A

ranked/ordinal; continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Value of rs varies between ___ and ___

A

1 and -1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

T/F Continuous data can be transformed into ranks then used to calculate Spearman’s rs.

A

T - but it is not recommended (you would not want to decrease the power of the data by making continuous ordinal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What test is used for spearman rank correlation coefficient/

A

nonparametric test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Spearman rs can be used when:

A
  1. continuous data are not available
  2. continuous data are skewed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the coefficient of determination?

A

r^2; how data fits along line of best fit
- the concept of a r^2 can be used to assess overall model fit:
- between 0 and 1
- determines the % of variability in the dependent endpoint that is due to the independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What does an r^2 close to 0 imply?

A

poor model fit (coincides with lower r-value) - data on the scatterplot have no pattern and are not close to the trend line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What does r^2 close to 1 imply?

A

excellent model fit - data on the scatterplot have a precise pattern and are tightly clustered around the trend line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does a study with an r=0.94 and an r^2=0.88 imply?

A

strong positive linear correlation and low variability (good model fit)

31
Q

What are the assumptions of correlation?

A
  1. Independent observations
  2. same population
  3. No outliers
  4. assumes linear relationship
32
Q

What are the limitations to correlation?

A
  1. Does not establish causality
  2. Chicken or the egg question
  3. Only tells us that an association is likely and how strong the association is (does NOT help us predict variable based on another - regression analysis)
33
Q

Unlike correlation, regression requires _______

A

an independent variable (If I increase the independent variable by 1 unit, how much will the dependent variable change?)

34
Q

Regression analysis attempts to predict or estimate the value of _____ from the known value of _____.

A

dependent variable; independent variable

35
Q

What is a common concern regarding regression analysis?

A

bias

36
Q

What needs to be done if there’s a concern for bias in regression analysis?

A
  • Experiments have to be taken “as given” and the inherent biases identified
  • Biases then can be removed statistically or averaged out
37
Q

List and define the types of regression.

A
  1. Simple regression - 1 dependent variable and 1 independent variable
  2. Multiple regression - 1 dependent variable and 2+ dependent variables
  3. Linear regression - continuous dependent variable
  4. Logistic regression - dichotomous/discrete dependent variable
    - simple/or multiple linear regression OR
    - simple/or multiple logistic regression
38
Q

Type of regression: 1 independent variable & 1 dependent variable that is continuous

A

simple linear regression

39
Q

Type of regression: 2 independent variables & 1 dichotomous/discrete dependent variable

A

multiple logistic regression

40
Q

Type of regression: 1 independent variable & 1 dichotomous/discrete dependent variable

A

simple logistic regression

41
Q

Type of regression: 2 independent variables & 1 continuous dependent variable

A

multiple linear regression

42
Q

_____ - the process of estimating the slope and intercept of a trend line through the “middle” of a scatterplot

A

Simple linear regression

43
Q

What is the independent variable of a linear regression?

A

y-axis

44
Q

What is the dependent variable of a linear regression?

A

x-axis

45
Q

_____ - an extension of the linear regression model when more than one independent variable is considered

A

Multiple linear regression

46
Q

_____ may be used to simply describe multiple independent variables the researchers are assessing.

A

Covariates

47
Q

Multiple linear regression can be used to account for _____

A

confounders

48
Q

How can multiple linear regression aid in removing bias?

A

If researcher collects all data on all possible confounders and incorporate these variables into the regression, they remove each and every possible source of bias, and now the (multiple regression produces unbiased and efficient estimates)

49
Q

It is difficult to graphically depict a scatterplot with more than 3 dimensions. Hence, multiple regressions are typically expressed as ______ and the estimates presented in ____.

A

equations; tables

50
Q

Rather than report with r or m, logistic regression typically reports as ___

A

Odds ratio (OR)

51
Q

What is an odds ratio?

A

ratio measuring the strength of an association (logistic regression)

52
Q

What does an odds ratio of <1 indicate? >1? =1?

A

< 1 indicates decreased events
> 1 indicates increased events
= 1 indicates no association

53
Q

Studies may say they did simple logistic regression, and will present a(n) _____
Then they may mention they did a multiple logistic regression, and present you with a(n) ____.

A

OR; adjusted OR

54
Q

Which is typically the more accurate representation of the “true” difference between groups: odds ratio/adjusted odds ratio?

A

adjusted odds ratio (specific to each study; specific differences between groups)

55
Q

_____ - time to event analysis

A

survival analysis (I.e., time until death in years in elderly population, time until rejection (months) from heart transplant, etc.)

56
Q

Survival analysis is also called ____ & ____

A

life events; survival plots

57
Q

In what types of studies are survival analyses often applied to in medical settings?

A

studies of death (but can be any outcome of interest - injury, onset of illness, recovery from illness, transition around a clinical threshold (CD4 count), etc)

58
Q

How is data collected for a survival analysis?

A

occurs in a follow-up period

59
Q

Appropriate technique when you are following subjects over time

A

survival analysis

60
Q

From start to finish, how is a survival analysis conducted?

A
  1. Subjects are enrolled at some well-defined point in time
  2. Data collection occurs in a follow-up period (prospective - follow pts to see if/when event of interest happens)
  3. Data collection stops on a subject because:
    - subject experiences the event of interest
    - the study ends
    - subject leaves the study for other reasons
61
Q

What statistical tests would you use for a survival analysis?

A
  1. Log-rank test - determines if there is a difference between the groups (type of chi-square test)
  2. Cox-proportional hazards regression - estimates hazard ratio
62
Q

survival analysis methods which allow for censoring:

A
  1. Kaplan Meier method
  2. Cox-proportional hazards regression (hazard ratio instead of OR)
63
Q

______ - the time from entry into a study until a subject has a particular outcome

A

time to event

64
Q

____ - subjects are said to be ____ if they are lost to follow up, they drop out of the study, or if the study ends before they die or have an outcome of interest in a survival analysis.

A

Censoring; censored

65
Q

How are censored subjects accounted for in a survival analysis?

A

They are counted as alive or disease free for the time they were enrolled in the study

66
Q

What is the outcome of interest for survival analysis?

A

time (survival time) until event (failure)

67
Q

Hazard ratio is a ____ ratio

A

rate

68
Q

Can have a(n) _____ to describe the relationship between 2+ groups and survival time for more than one covariate

A

adjusted hazard ratio

69
Q

What does a hazard ratio (HR) < 1 indicate? > 1? = 1?

A
  1. HR < 1 - survival probability is higher in the treatment group
  2. HR > 1 - survival probability is lower in the treatment group
  3. HR = 1 - no effect`
70
Q

____ - provides estimates of the survival function/curve

A

Kaplan Meier method

71
Q

What does it mean if the survival function is always higher for one group than another?

A

one group is “surviving” longer than the other

72
Q

What does it mean if survival functions cross in Kaplan Meier method?

A

situation is unclear

73
Q

What are the assumptions to survival analysis?

A
  1. random (or representative sample)
  2. independent subjects
  3. entry criteria are consistent
  4. endpoint defined consistently
  5. starting time clearly defined
  6. censoring unrelated to survival
  7. Average survival doesn’t change during the study
74
Q

What is a caveat to the assumptions of survival analysis?

A

survival estimates can be unreliable toward the end of a study when there are small numbers of subjects at risk of having an event