Biostats_5_Correlation, P-value and Statistical Significance Flashcards

1
Q

What is linear association?

A

Linear association describes a straight-line relationship between variables. A correlation coefficient quantifies this relationship numerically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the correlation coefficient?

A

The correlation coefficient (r) ranges from -1 to +1 and describes the strength and direction of a linear relationship between two variables.

For positive associations, the closer to +1, the stronger the association.

For negative association, the closter to -1, the stronger the association.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does positive correlation indicate?

A

Positive correlation indicates that as the value of one variable increases, the value of the other variable also increases.

r = +1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In terms of correlation, as the value of one variable increases, the value of the other decreases represents what type of correlation coefficient?

A

When there is a negative correlation, there are inverse relationships between the varibles. For instance, the value of the dependent variable decreases as the independent variable increases. This is reflected by a downward-sloping best-fit line, with the correlation coefficient (R) being less than 0 but greater than -1. The closer R is to -1, the stronger the negative linear relationship between the two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does no correlation mean?

A

No correlation means no linear relationship exists between the variables.

where r is near 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What a correlation such as this indicate?

A

When R = 0, the best-fit line is flat, indicating no clear relationship between the independent and dependent variables. The scattered points suggest a lack of correlation, meaning there is no association between the two variables.

r = -1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the coefficient of determination?

A

This is the proportion of variability in the dependent variable.

The coefficient of determination (r^2) represents the proportion of variance in the dependent variable explained by the independent variable.

For example, if r = -0.8; this means r^2 = 0.64 or 64%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A study is conducted to assess the relationship between body mass index (BMI) and daily physical activity (measured in hours). The investigators find that BMI is inversely related to daily physical activity, with a correlation coefficient of -0.6 (p < 0.01). According to this information, how much of the variability in BMI can be explained by daily physical activity?

A. 36%
B. 60%
C. 40%
D. 6%

A

Correct Answer: 36%

Explanation: The proportion of variability in the dependent variable (BMI) that can be explained by the independent variable (daily physical activity) is determined by the square of the correlation coefficient (r^2). The correlation coefficient (r) is given as -0.6. Squaring this value gives 0.36. This means 36% of the variability in BMI can be explained by daily physical activity. The negative sign of the correlation coefficient indicates the direction of the relationship (inverse), but it does not affect the proportion of variability explained.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between causation and correlation?

A

A correlation does not imply causation; other factors may influence the relationship between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The assumption that there is no association between two measured variables (e.g., the exposure and the outcome) or no significant difference between two studied populations other than what would be expected from sampling or experimental error, is called … ?

A

the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the two types of hypotheses and what do they mean?

A

Null Hypothesis (H₀): No association exists between the exposure and the outcome.

vvvvvvvvvvvvv

Alternative Hypothesis (H₁): Association exists between the exposure and the outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does an RR of 1.08 with p-value = 0.01 mean?

A

There is a statistically significant association between the exposure and outcome.
A p-value of 0.01 means there is a 1% probability the null hypothesis is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When the p-value is above the alpha threshold, a researcher should … ?

A

” Fail to reject the H₀ “
vvvvvvvvvvvvvvvvvv
This means accept the null hypothesis where the null hypothesis (H₀) states that there is no association between the exposure and the outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the usual alpha threshold when conducting studies?

A

A threshold of 0.05 is standard in many fields, indicating a 5% chance of falsely rejecting the null hypothesis. This means that there is a 5% probability of concluding there is an effect when there is none (false positive).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Falsely rejecting the null hypothesis means that a researcher committed a ____ error

A

Type I error
vvvvvvvvvvvvvvvv
This is the same as creating a false positive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The probability of obtaining the observed results (or more extreme results) assuming the null hypothesis is true, is how the ____ is described.

A

p-value

17
Q

A study finds that the relative risk (RR) of an outcome for patients is 1.23, with a p-value of 0.03 and a 95% confidence interval (CI) of 1.12-1.46. What does this p-value mean in terms of RR?

A

The p-value is the probability that the result of a given statistical test will be at least as extreme as the result actually observed on repeat testing, assuming that the null hypothesis is true. Accordingly, the p-value of 0.03 in this study signifies that there is a 3% probability that the true population RR is at least as extreme as 1.23, the relative risk, assuming that the null hypothesis is true.

vvvvvvvvvvvvv

Alternatively, if there were no association risk factor and the outcome (i.e., RR = 1.0) and the study was repeated 100 times, a relative risk of *1.23 *would only be found 3 times (i.e., 3% of the repeated studies). However, this is not equivalent to stating that there is a 3% probability that the RR of 1.23 is due to chance alone.

18
Q

A study finds that the relative risk (RR) of an outcome for patients is 1.23, with a p-value of 0.03 and a 95% confidence interval (CI) of 1.12-1.46. What does this p-value mean in terms of the confidence interval (CI)?

A

A CI is a range of values associated with an estimate that is thought to contain the true population value with a given level of confidence. The Cl is determined by the alpha value, which is typically set at 5%, using the following formula: 100% - alpha = Cl. The 95% CI of 1.12–1.46 indicates that there is a 95% probability that the true population RR lies between 1.12 and 1.46.

vvvvvvvvvvvvvv

CIs and p-values are interrelated and either can be used to determine whether a result is statistically significant. In this study, the p-value is < 0.05, which corresponds to a 95% CI that does not include the null value (i.e., RR = 1.0), meaning that the null hypothesis can be rejected and the association is therefore considered statistically significant. Because RR is a ratio and does not follow a normal distribution, the CI is not symmetrical with the observed RR.

19
Q

What is the relationship between p-value and the confidence interval spread?

A

A p-value reflects how strongly the data contradict the null hypothesis. The narrower the confidence interval and the further it is from including the null value (1.0 for relative risk or odds ratio), the smaller the p-value tends to be.

20
Q

What should a researcher do if the p-value is below the alpha threshold?

A

If the p-value of a statistical test is less than or equal to the alpha threshold, the null hypothesis is rejected, and the result is considered statistically significant. The null hypothesis (H₀) states that there is no association between the exposure and the outcome. Rejection of the null hypothesis means the outcome did not occur by chance.

21
Q

What does a p-value < 0.05 indicate?

A

A p-value < 0.05 indicates that the observed relationship is statistically significant and unlikely to have occurred by chance. This is the probability of obtaining the results if the null hypothesis is true.

22
Q

What is the significance of a 95% confidence interval (CI) for relative risk (RR) or odds ratio (OR) that includes 1.0?

A

If the 95% CI includes 1.0, it means there is a > 5% chance that the observed association is due to chance. This corresponds to a p-value > 0.05, and the null hypothesis of no association cannot be rejected.

23
Q

What does it mean if the 95% CI for relative risk (RR) or odds ratio (OR) does not include 1.0?

A

If the 95% CI does not include 1.0, there is a < 5% chance that the observed association is due to chance. This corresponds to a p-value < 0.05, and the null hypothesis of no association is rejected.

24
Q

How is the standard error of the mean (SEM) calculated and what is the purpose?

A

SEM = SD/ sqrt(n)

vvvvvvvvvvvvvvvvvvvvv

Once this is calculated, then SEM is multiplied by the z-score.

vvvvvvvvvvvvvvvvvvvvvv

for a 99% CI this is 2.58
for a 96% CI this is 1.96

25
Q

How does sample size alter the confidence interval?

A

Sample size is a part of the calculation for determining the confidence interval, the bigger the sample size, the tighter the confidence interval.

26
Q

What elements are needed to calculate the limits of a confidence interval?

A

To calculate the Cl around the mean you must know the following: the mean, standard deviation (SD), z-score and sample size (n). A Cl can be calculated to correspond with the mean of any continuous variable.

vvvvvvvvvvvvvvvvvvv

Mean +/- 1.96 * [SD/(sqrt(n))]