Measures of Association (Biostats) Flashcards

1
Q

Precision takes into account a measurement’s (or set of measurement’s) … ?

A

Reliability

vvvvvvvvvvvvvv

The consistency and reproducibility of a test.
The absence of random variation in a test.

vvvvvvvvvvvvvv

Reliability refers to how similar the data points are to each other: when reliability is low, the data points are more widely dispersed. When reliability is high, the data points are more close together. An analyzer can be “reliably wrong” or “precisely wrong.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the precision do to the standard deviation (SD)?

A

SD decreases when the measurements are more precise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Accuracy takes into account a measurement’s (or set of measurement’s) … ?

A

Validity

vvvvvvvvvvvvvv

The closeness of test results to the true values.
The absence of systematic error or bias in a test.

vvvvvvvvvvvvvv

Validity refers to how close the data points are to the true value: when validity is low, the data points do not approximate the true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

An analysis that renders values such as these would have (high/low) precision/accuracy?

A

Low reliability and High validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

An analysis that renders values such as these would have (high/low) precision/accuracy?

A

High reliability and Low validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

An analysis that renders values such as these would have (high/low) precision/accuracy?

A

Low reliability and Low validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

An analysis that renders values such as these would have (high/low) precision/accuracy?

A

High reliability and High validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Random error will impact the ________ of a test?

A

precision

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Systemic error will impact the ________ of a test?

A

accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Specificity and sensitivity would relate to precision or accuracy?

A

Both of these measures (using standardized values) are of tests of validity and refers to the ability of a test to correctly identify those who do not have a certain disease (specificity) or the ability of a test to correctly identify those who have the disease (sensitivity).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the absolute risk help to measure?

A

There will be two absolute risk categories, one for the exposure group and another for the unexposed group.

vvvvvvvvvvvvvvv

Taking these in isolation is equivalent to measuring the incidence of disease in that group or population.

vvvvvvvvvvvvvvv

When taking the quotient between these two absolute risk groups, then the relative risk can be obtained.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is typically used in studies where participants are followed prospectively to observe outcomes?

What type of study is this?

A

Relative Risk is used in a cohort study where people are followed and their “risk” of developing a disease later in the future refers to the probability of disease (or an event) occurring over a certain period of time.

vvvvvvvvvvvvvvvv

These are prospective study designs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When two groups of interest are compared in a study over time, how (generally) is the ratio set up that functions to compare the group at risk vs the group not at risk ?

A

A ratio, called “relative risk,” analyzes the risk associated to the exposed group over the risk associated to the unexposed group. The rate at which the exposed group experiences disease is the numerator and the rate at which the unexposed group experiences disease is in the denominator.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the complete formula for calculating Relative Risk (RR)?

A

RR = [a/(a+b)] / [c/(c+d)]

[a/(a+b)] = the risk in the exposed group

(a+b) = all the members in the exposed group

a = diseased cases in exposed group

b = non-diseased cases in the exposed group

[c/(c+d)] = the risk in the unexposed group

(c+d) = all the members in the unexposed group

c = diseased cases in unexposed groups

d = non-diseased cases in the unexposed group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Attributable Risk?

A

The excess incidence of a disease due to a particular factor (exposure).

vvvvvvvvvvvvvvv

Attributable risk is also known as the ‘risk difference’ and is the absolute value in terms of risk between the exposed and unexposed groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the Attributable Risk (AR) if out of 100 people, 60 were exposed and out of these 60 people, 30 developed the disease of interest, while only 10 from the unexposed group developed the disease of interest?

A

AR = | (Incidence in Exposed) - (Incidence in Unexposed) |

For example, 100 people are analyzed.

60 were exposed and 40 were not exposed.

In the exposed group, 50% of the members experienced disease (30 out of 60).

In the unexposed group, 25% of the members experienced disease (10 out of 40).

The AR = (30/60) - (10/40) = 0.5 - 0.25 = 0.25 (or 25%).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

When the attributable risk is referring to the beneficial effects of an intervention, this is … ?

A

Absolute risk reduction (ARR).

vvvvvvvvvvvvvvvvvvv

The difference in risk attributable to an exposure as compared to non-exposure.

vvvvvvvvvvvvvvvvvvv

Absolute risk reduction (ARR) = risk in non-exposed group – risk in exposed group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

If the risk of developing lung cancer in heavy smokers is 28% and the risk in non-smokers is 6%, what is the absolute risk reduction in lung cancer for individuals who do not smoke?

A

22%

vvvvvvvvvvvvvvv

This is derived in the same way as attributable risk, which is the excess in after taking the difference between the absolute risk in the exposed group and the unexposed group. In this case, the absolute risk of developing lung cancer in smokers is 28%, while in non-smokers, it is 6%. Calculating the difference (excess) between these two values (28% - 6%). This also determines the attributable risk that represents the absolute risk reduction in lung cancer for individuals who do not smoke, which equals 22%, meaning not smoking is the intervention, which provides an absolute risk reduction of 22%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

If 8% of people who receive a placebo vaccine develop the flu vs 2% of people who receive a flu vaccine, then the absolute risk reduction is … ?

A

8%–2% = 6% = 0.06

vvvvvvvvvvvvv

This is equivalent to saying that the vaccine (intervention) had an absolute risk reduction of 6%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is Number Needed to Treat (NNT)?

A

The number of patients that need to be treated to prevent one additional adverse outcome.

vvvvvvvvvvvvvvv

Formula: NNT = 1 / Absolute Risk Reduction (ARR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What insight does Number Needed to Treat (NNT) provide?

A

Practical insight into the effectiveness of a treatment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How is Number Needed to Treat (NNT) calculated?

A

1) First determine the absolute risk reduction, ARR = (Mortality Rate in Control - Mortality Rate in Treatment).

2) Then take the reciprocle of this value.

For example if a new treatment regimen now has a death rate of 25/50 = 0.5 over 5 years, whereas in patients kept on the conventional regimen had a mortality rate of 75/100 = 0.75, then the absolute risk difference between the two groups would be 0.75 - 0.5 = 0.25. Taking the reciprocal (1/0.25 = 4) of the absolute risk difference allows for the NNT to be determined.

Example: = 0.75 - 0.5 = 0.25; NNT = 1 / 0.25 = 4.

Based on this result, we can conclude that we need to treat 4 patients with the new regimen as opposed to the conventional regimen in order for one more patient to survive 5 years without relapse.

NNT = 1 / Absolute Risk Reduction (ARR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the epidemiological measure of risk that refers to the proportion of decreased risk due to an intervention compared to the control group.

A

This is the relative risk reduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the significance of ( 1 - RR ) ?

A

This is how the relative risk reduction is expressed mathematically, where the proportion of risk reduction attributable to the intervention/treatment is compared to the control (non-intervention or non-treatment).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

When can ( 1 - RR ) be used?

A

When RR is less than 1, the RRR is determined by ( 1 - RR )

vvvvvvvvvvvvvv

What is used when the RR is above 1 is ( RR - 1 ) .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Can relative risk reduction be expressed as a percent?

A

Yes:

vvvvvvvvvvvvvvv

When RR is less than 1:
% risk reduction = (1 - RR) x 100

vvvvvvvvvvvvvvv

When RR is more than 1:
% risk increase = (RR - 1) x 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

If the insight of interest is to determine the RR and the RRR, where, 2% of patients who receive a flu shot develop the flu, while 8% of unvaccinated patients develop the flu, then what is the RR and RRR?

A

RR = 2/8 = 0.25

vvvvvvvvvvvvvvvvvvv

RRR = (1 - 0.25) = 0.75

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

If a study needed to determine how much an exposure or risk factor has contributed to the incidence of a disease, and the relative risk was provided, what measure of assoication would be appropriate and how would this be calculated?

A

This would require the Attributable Risk Percent (AR%), which is the proportion of disease incidence in the exposed group attributable to the exposure.

Formula: AR% = [ ( RR - 1) / ( RR ) ] x 100 = (Attributable Risk / Incidence in Exposed) × 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How is Attributable Risk Percent (AR%) calculated?

A

AR% = ( [Attributable Risk] / [Incidence in Exposed] ) × 100.

AR% = [ ( RR - 1) / ( RR ) ] x 100.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is Population Attributable Risk Percent (PAR%)?

A

The proportion of disease incidence in the total population attributable to the exposure.

vvvvvvvvvvvvvvv

[(incidence of disease in the total population) - (incidence of disease among the unexposed group)] / (incidence of disease in the total population)

vvvvvvvvvvvvvvv

In order to determine this, the incidence of the disease within the entire population (irrespective of whether they were exposed to the risk factor) is subtracted by the incidence of developing the disease in the unexposed group (which is assuming random chance of developing the disease). That value is then placed within a ratio to the entire population’s incidence where the numerator is the value obtained from subtracting out the random chance and the denominator is the incidence of the entire population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How is Population Attributable Risk Percent (PAR%) calculated?

A

To determine population attributable risk percent:

1) First calculate the incidence of the disease in the study population as a whole. For example, if a study population of 100 people (where 60 were smokers and 40 were non-smokers) had 30 individuals from the smoker group and 10 individuals from the non-smoker group who developed respiratory disease or symptoms, then the overall incidence of developing respiratory disease or symptoms in this study population would be 40/100.

2) Next, calculate the difference in risk of developing respiratory disease among the study population as a whole and among non-smokers (40/100 - 10/40 = 0.4 - 0.25 = 0.15). To explain this further, 40/100 accounted for the incidence of developing disease or symptoms in the entire population while 10/40 was the risk based on random chance.

3) Divide the difference in risk between the two groups by the incidence of respiratory disease in the population as a whole (0.15/0.4 = 0.375) to determine that Based on the calculation, 37.5% of the yearly respiratory disease in the study population is attributable to smoking.

PAR% = [(Incidence in Total Population - Incidence in Unexposed) / Incidence in Total Population] × 100.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What does Attributable Risk Percent (AR%) show?

A

The proportion of disease incidence in exposed individuals that is due to the exposure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What does Population Attributable Risk Percent (PAR%) demonstrate?

A

The impact of exposure on disease incidence in the entire population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

The number needed to harm is determined by … ?

A

Attributable risk (AR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

When you take the past exposures and compare those exposures to those who were not exposed, this is a(n) ________ study that uses a(n) ________ to make the comparison.

A

When you take the past exposures and compare those exposures to those who were not exposed, this is a(n) case-control study that uses a(n) odds ratio to make the comparison.

Odds ratio (OR) is the measure of association used in case-control studies. It compares the odds of exposure in cases to the odds of exposure in controls. A case-control study is retrospective in nature. It starts with individuals who have a specific outcome (cases) and those who do not (controls). The study then looks back in time to compare exposure histories between the two groups. Case-control studies are particularly useful for studying rare diseases or outcomes because they focus on individuals who already have the outcome of interest. The odds ratio in these studies provides an estimate of the relative risk, especially when the outcome is rare.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is the formula for calculating Odds Ratio (OR)?

A

OR = (a/c) / (b/d) = (ad) / (bc), using the 2×2 table structure.

a = Exposed and Diseased

c = Unexposed and Diseased

b = Exposed and Non-Diseased

d = Unexposed and Non-Diseased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is the key difference between cohort studies and case-control studies in terms of risk measures?

A

Cohort studies measure incidence prospectively and calculates the relative risk. The relative risk compares the probability of developing an outcome between two groups over a certain period of time. It implies a prospective study design because the patients are followed over time to see whether or not they develop an outcome. Relative risk determines within certain period of time, how many times are exposed people likely to develop a disease compared to those who have remianed unexposed.
vvvvvvvvvvvvvvvvvvvvvvvvvv
Case-control studies makes use of retrospective data and calculates the odds ratio. The odds ratio compares the chance of exposure to a particular risk factor in cases and controls. Cases have developed the disease and controls remain disease free. “Risk,” in it’s purest sense is not calculated, instead infered indirectly in case-control studies, because the anaylsis is retrospective. Odds ratio answers how many times are diseased people more likely to have been exposed to a particular factor in comparion to those who have not developed disease.
vvvvvvvvvvvvvvvvvvvvvvvvvvv
Both relative risk and odds ratio are measured on a scale from 0 to infinity. The value of 1.0 indicates no difference between the two groups being compared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is the ‘rare disease assumption’?

A

The ‘rare disease assumption’ states that when a disease is rare, the Odds Ratio approximates the Relative Risk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What does a Relative Risk (RR) or Odds Ratio (OR) of 1 indicate?

A

An RR or OR of 1 indicates no difference in risk between the exposed and unexposed groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

A Relative Risk (RR) > 1 would mean that …

A

Indicates that the exposure is associated with an increased risk of the outcome. For example, an RR of 2.0 means the exposed group is twice as likely to develop the outcome compared to the unexposed group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

A Relative Risk (RR) < 1 would mean that …

A

Indicates that the exposure is associated with a reduced risk of the outcome. An RR of 0.5 means the exposed group has half the risk of developing the outcome compared to the unexposed group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

An Odds Ratio (OR) = 1 would indicate that …

A

Indicates that there is no association between the exposure and the outcome. The odds of the outcome are the same in both the exposed and unexposed groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

How do you interpret an Odds Ratio (OR) > 1?

A

An OR > 1 suggests that exposure is associated with higher odds of the outcome occurring. For example, an OR of 2.0 means the odds of the outcome are twice as high in the exposed group compared to the unexposed group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Odds Ratio (OR) < 1 would indicate that …

A

This indicates that the exposure is associated with lower odds of the outcome. For example, an OR of 0.5 means the odds of the outcome are half as likely in the exposed group compared to the unexposed group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

How can Relative Risk (RR) and Odds Ratio (OR) be applied in clinical practice?

A

RR and OR help assess the strength of association between an exposure and an outcome, aiding in clinical decision-making.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What makes the relative risk and odds ratio similar?

A

Relative risk and odds ratio are measures of association which provide point estimates of effect. They are useful in describing the magnitude of an effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What are the two measures of dispersion?

A

Standard deviation

vvvvvvvvvvvvvvv

Standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

How is a normal distribution set?

A

1 sd = 68 % of all values (+/- 1 sd from the mean is +/- 34%)

2 sd = 95% of all values (+ / - 2 sd from the mean is +/- 14%)

3 sd = 99% of all values (+ / - 3 sd from the mean is +/- 2 %)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What is used to determine the accuracy of the mean?

A

The likelihood of the estimated mean to be accurate is “standard error of the mean (SEM)”

vvvvvvvvvvvvvvv

The standard error of the mean is a specific kind of standard deviation: while SD describes the dispersion of sample data in relation to its mean, SEM describes the dispersion of means of different samples from a population mean. As the SD increases and the sample size decreases, SEM will increase.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

How is the standard error of the mean (SEM) calculated and what is the purpose?

A

SEM = SD/ sqrt(n)

vvvvvvvvvvvvvvvvvvvvv

Once this is calculated, then SEM is multiplied by the z-score.

vvvvvvvvvvvvvvvvvvvvvv

for a 99% CI this is 2.58
for a 96% CI this is 1.96

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

How does sample size alter the confidence interval?

A

Sample size is a part of the calculation for determining the confidence interval, the bigger the sample size, the tighter the confidence interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What elements are needed to calculate the limits of a confidence interval?

A

To calculate the Cl around the mean you must know the following: the mean, standard deviation (SD), z-score and sample size (n). A Cl can be calculated to correspond with the mean of any continuous variable.

vvvvvvvvvvvvvvvvvvv

Mean +/- 1.96 * [SD/(sqrt(n))]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Would increasing the amount of measurements alter the standard deviation?

A

No, the standard deviation measures the dispersion or spread in data and is an intrinsic property of the population from which the sample is drawn. Increasing the sample size may increase the accuracy of estimating the standard deviation, but it will not change the standard deviation itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Would increasing the amount of measurements alter the standard error of the mean?

A

Yes, the standard error of the mean (SEM) is a measure of the dispersion of a random set of sample means around the true population mean. It is dependent on the variability (i.e., standard deviation) of the measured values and the sample size (SEM = SD/√n). By increasing the sample size, the sample means approach the true population mean, resulting in a smaller SEM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

How would a larger standard deviation alter the standard error of the mean?

A

A greater standard deviation will increase the SEM, resulting in a less accurate estimate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

How would a smaller sample size affect the standard error of the mean?

A

A smaller sample size will increase the SEM, resulting in a less accurate estimate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

When a sample is measured and population mean is then subtracted from the sample measurement and this result is then divided by the standard deviation, what is this value if we assume that all measurements follow a normal distribution?

A

Z-score

vvvvvvvvvvvvvvv

This value is used to express data in terms of units of standard deviation and how many standard deviations from the mean a particular value is is represented in its value. With a z-score a research can compare values between other populations with different means and standard deviations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

How are confidence intervals (CIs) calculated?

A

CIs are defined as the mean ± standard error of the mean, which is calculated by multiplying a Z-score (for 95% confidence intervals this is always 2) by the standard deviation (SD) divided by the square root of the sample size (Mean +/- Z-score (SD/√sample size)). A larger sample size or a decreased SD (based on data precision) will decrease the standard error. Including more disparate data or reducing the sample size, will increase the standard error, thus expand the CI.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

The average of the squared differences of values in a data set from the mean value is … ?

A

Variance

vvvvvvvvvvv

The variance allows interpretation of how far a set of data is spread out. A variance of zero means that there is no variability in the values. Largely different numbers in a set of data lead to a large variance.

62
Q

What is the functional use of dividing the standard deviation by the mean?

A

The coefficient of variation (CV)

vvvvvvvvvvvvvvvv

Used to measure and compare the dispersion around the mean of multiple data sets.

vvvvvvvvvvvvvvvv

CV, which is a statistical relative measure of dispersion, allows SD to be interpreted relative to the mean, allowing for the comparison of multiple data sets that may have means of different magnitudes or units of measurement, as CV is dimensionless. A high CV indicates that values are widely spread around the mean.

63
Q

How does the CV differ from the SD?

A

SD, which is an absolute measure of dispersion, describes the variability of data in relation to the mean within a single data set. CV is a relative measure of dispersion.

64
Q

When the distribution of data is not skewed, the (mean/median/mode) are all … ?

A

aligned

65
Q

In both positively and negatively skewed distributions, what is the new “peak?”

A

The mode becomes the apex of the curve.

vvvvvvvvvvv

For the positively skewed distributions, the mode (peak) is displaced to the left.

vvvvvvvvvvv

For the negatively skewed distributions, the mode (peak) is displaced to the right.

66
Q

With negatively skewed distributions, the mean is displaced to the ______ on a distribution curve.

A

left

vvvvvvvvvvv

The Peak is to the right and the tail is to the left.

vvvvvvvvvvv

Going left to right is mean (“-“ tail), median, mode (apex).

67
Q

In a negatively skewed distribution, which (mean, median, mode) is greatest?

A

Mean < median < mode

68
Q

With positively skewed distributions, the mean is displaced to the ______ on a distribution curve.

A

right

vvvvvvvvvvv

The peak is to the left and the tail is to the right.

vvvvvvvvvvv

Going left to right is mode (apex), median, and mean (“+” tail).

69
Q

In a positively skewed distribution, which (mean, median, mode) is greatest?

A

Mean > median > mode

70
Q

What can produce a positively skewed distribution?

A

Having a single or group of outliers that are higher and value in proportion to the majority of the range of values in the study. This will push the mean in a positive direction, and create a towel on the positive end while maintaining the mode more towards the negative end and the median between both of these. The mean will be smaller than both the median and mode.

71
Q

What are the two types of hypotheses and what do they mean?

A

Null Hypothesis (H₀): No association exists between the exposure and the outcome.

vvvvvvvvvvvvv

Alternative Hypothesis (H₁): Association exists between the exposure and the outcome.

72
Q

What statistical test is used to check differences between the means (quantitative) of TWO groups (qualitative)?

A

t-test

73
Q

If a researcher was comparing the mean heart rate between men and women, what would be used to make this comparison?

A

T-test

vvvvvvvvvvvvv

Calculates the difference between the means of two samples or between a sample and population or a value subject to change; especially when samples are small and/or the population or a value subject to change distribution is not known.

74
Q

What are the different types of T-tests?

A

One-sided, Two-sided, paired and unpaired

75
Q

When a researcher wants to determine whether the means of two groups differ from one another, what statistical analysis is used?

A

One sample t-test

76
Q

When a study uses the means to compare two independent variables, what is used to compare them?

A

Two-sided sample T (student’s T test)

77
Q

What is the Two-sample t test (Student’s t test) used for?

A

It is used to compare means of two independent groups. The basic requirements are the two mean values, sample variances, and sample size.

78
Q

How is the p-value obtained for the two-sample t test (also called Student’s t test)?

A

The t statistic is then obtained to calculate the p value.

79
Q

In a two-sided t-test, if the p-value is less than 0.05, what is the implication?

A

If the p-value is less than 0.05, the null hypothesis is rejected, indicating a statistically significant difference, and the two means are assumed to be statistically different.

80
Q

In a two-sided t-test, if the p-value is more than 0.05, what is the implication?

A

When the p value is large (i.e. greater than 0.05), then the null hypothesis is retained.

81
Q

An investigator compares an average standardized depression score in two groups of patients, those who take beta-blockers and those who do not. Which statistical analysis was likely used by the investigator to analyze the study results?

A

Two-sided sample T (student’s T test)

82
Q

When the same individual mean is followed over time, what is used to study any comparison?

A

paired T test

vvvvvvvvvvvvvvvv

In a paired t-test, data is derived from study subjects who have been measured at two different points in time (e.g., before and after a treatment). The difference between the means of a continuous outcome variable of this group is compared. The null hypothesis is that the group mean is equal at these two different times. A statistically significant difference rejects the null hypothesis.

83
Q

When is the Paired t test used?

A

The Paired t test is used to compare two means when the data is dependent on an intervention in the same individuals, such as a comparison between a baseline BMI and BMI after a particular treatment.

84
Q

When two different groups (categorical, e.g. cases and controls) are sampled at the same time and their means (continuous outcomes) are used for comparison, what statistical analysis is used? How will this relate to the two hypotheses?

A

An unpaired t-test evaluates two different groups (independent samples) that are sampled at the same time. The difference between the means is a continuous outcome variable of these 2 groups being compared.

vvvvvvvvvvvvvvvv

The null hypothesis is that the mean of these two groups is equal and the alternative hypothesis states that there’s a statistically significant difference in the means, thus, will reject the null hypothesis.

85
Q

Unpaired t-test is used to compare the difference between a continuous outcome variable for

A

2 group(s) at 1 point(s) in time

86
Q

What statistical test is appropriate for comparing BMI before and after treatment in 100 patients?

A

The Paired t test is appropriate because the means being compared are dependent (baseline vs. post-treatment in the same individuals).

vvvvvvvvvvvvvv

Think of quantum entanglement (electrons are paired in time and space).

87
Q

When a study compares the difference between the means of a continuous outcome variable of two or more categorical groups.

A

Analysis of variance (ANOVA)

88
Q

What statistical analysis would be used to determine if there is a statistically significant difference in the mean reduction of blood pressure (a continuous and dependent variable) between different dose groups (e.g., 5 mg, 10 mg, 15 mg; categorical groups) of a given antihypertensive therapy (independent variable)?

A

Analysis of variance (ANOVA)

vvvvvvvvvvvv

The measurement of the mean BP is the dependent variable that happens to be continuous.

vvvvvvvvvvvv

The three doses of antihypertensive therapy is the independent variable that is categorical.

89
Q

When proportions of any type are used for comparison, what studies are important?

A

Chi-squared for big sample sizes

Fisher’s exact test for small sample sizes

90
Q

Proportions, in the context of Chi-squared or Fisher’s exact test, are pseudonymous with …. ?

A

categorical

91
Q

What is the Chi-square test used for?

A

The Chi-square test is used to compare proportions of a categorized outcome, such as serum CRP levels categorized as ‘high’ or ‘normal,’ in a 2x2 contingency table.

92
Q

What is Fisher’s exact test, and when is it used?

A

Fisher’s exact test is used when the sample size is small, particularly when an expected value in either cell of a 2x2 contingency table is less than 10.

93
Q

Do the Chi-squared test or the Fischer’s exact test depend on the number of subgroups or categories?

A

No.

vvvvvvvvvvvv

As long as the dependent variables (e.g. cases [treatment group] vs control [non-treatment group]) as well as the outcomes (e.g. the degree of effect) are organized in a categorical sense.

94
Q

What is the correlation coefficient?

A

The correlation coefficient (r) ranges from -1 to +1 and describes the strength and direction of a linear relationship between two variables.

For positive associations, the closer to +1, the stronger the association.

For negative association, the closter to -1, the stronger the association.

95
Q

What does positive correlation indicate?

A

Positive correlation indicates that as the value of one variable increases, the value of the other variable also increases.

96
Q

In terms of correlation, as the value of one variable increases, the value of the other decreases represents what type of correlation coefficient?

A

When there is a negative correlation, there are inverse relationships between the variables. For instance, the value of the dependent variable decreases as the independent variable increases. This is reflected by a downward-sloping best-fit line, with the correlation coefficient (R) being less than 0 but greater than -1. The closer R is to -1, the stronger the negative linear relationship between the two variables.

97
Q

What statistical analysis is used to measure of the strength and direction of a linear relationship between two continuous variables?

A

Pearson correlation coefficient

vvvvvvvvvvvvvvvvvv

This is a statistical measure of the strength and direction of a linear relationship between two variables. A near perfect relationship is when the line that best fits is “1” and ranges anywhere from -1 to +1, where -1 represents a perfectly negative linear relationship and +1 represents a perfectly positive linear relationship.

98
Q

What does positive correlation indicate?

A

Positive correlation indicates that as the value of one variable increases, the value of the other variable also increases.

99
Q

What does negative correlation indicate?

A

Negative correlation indicates that as the value of one variable increases, the value of the other decreases.

100
Q

What does no correlation mean?

A

No correlation means no linear relationship exists between the variables.

101
Q

What does no correlation mean?

A

No correlation means no linear relationship exists between the variables.

vvvvvvvvvvvvvvvv

where r is near 0.

102
Q

What a correlation such as this indicate?

A

R= 0, thus there is no correlation.

vvvvvvvvvvvvvvvv

When R = 0, the best-fit line is flat, indicating no clear relationship between the independent and dependent variables. The scattered points suggest a lack of correlation, meaning there is no association between the two variables.

103
Q

What is the coefficient of determination?

A

This is the proportion of variability in the dependent variable.

vvvvvvvvvvvvvvvv

The coefficient of determination (r^2) represents the proportion of variance in the dependent variable explained by the independent variable.

vvvvvvvvvvvvvvvv

For example, if r = -0.8; this means r^2 = 0.64 or 64%

104
Q

A study is conducted to assess the relationship between body mass index (BMI) and daily physical activity (measured in hours). The investigators find that BMI is inversely related to daily physical activity, with a correlation coefficient of -0.6 (p < 0.01). According to this information, how much of the variability in BMI can be explained by daily physical activity?

A. 36%
B. 60%
C. 40%
D. 6%

A

Correct Answer: 36%

Explanation: The proportion of variability in the dependent variable (BMI) that can be explained by the independent variable (daily physical activity) is determined by the square of the correlation coefficient (r^2). The correlation coefficient (r) is given as -0.6. Squaring this value gives 0.36. This means 36% of the variability in BMI can be explained by daily physical activity. The negative sign of the correlation coefficient indicates the direction of the relationship (inverse), but it does not affect the proportion of variability explained.

105
Q

What is the difference between causation and correlation?

A

A correlation does not imply causation; other factors may influence the relationship between variables.

106
Q

The assumption that there is no association between two measured variables (e.g., the exposure and the outcome) or no significant difference between two studied populations other than what would be expected from sampling or experimental error, is called … ?

A

the null hypothesis

107
Q

What is the primary aim of survival analysis?

A

To determine the average time to a given outcome identified on follow-up and to measure disease prognosis.

108
Q

What type of study design is survival analysis typically associated with?

A

It is always prospective in nature, often using data from cohort studies or randomized controlled trials (RCTs).

109
Q

What is the Kaplan-Meier curve used for in survival analysis?

A

It is used to analyze incomplete time-to-event data and estimate the survival probability of subjects over time, even if participants drop out or are lost to follow-up.

110
Q

How does the Kaplan-Meier curve visually represent survival analysis?

A

The Kaplan-Meier curve displays survival probability over time as a step-shaped diagram, where each step corresponds to an event such as death or recovery.

111
Q

What is the purpose of the log-rank chi-squared test in survival analysis?

A

The log-rank chi-squared test is used to determine if there is a statistically significant difference in survival curves between two or more groups and provides a p-value where less than 0.05 suggests there is a difference in survival between the two groups being evaluated.

112
Q

What is the value of a hazard ratio in survival analysis?

A

The hazard ratio compares the likelihood of an event occurring at any time point between two groups. A hazard ratio >1 indicates a higher risk in the exposed group, while <1 indicates a lower risk. A value of 1 means there is no difference between the groups being evaluated.

113
Q

What does censoring mean in the context of survival analysis?

A

Censoring occurs when participants do not experience the event of interest during the observation period, such as dropping out or being lost to follow-up. This is notated with either a vertical dash along the stair step curve or a dot.

114
Q

How is survival probability calculated in a Kaplan-Meier analysis?

A

It is calculated for each time interval as the number of patients for whom the event has not occurred divided by the number of patients at risk. For each range of time, probability is multiplied consecutively.

115
Q

What is the significance of the five-year survival rate in survival analysis?

A

It represents the percentage of patients who have survived five years after the initial diagnosis of a disease.

116
Q

What are competing risks in survival analysis?

A

Competing risks refer to events that prevent the occurrence of the primary event of interest, such as death from a different cause.

117
Q

The assumption that there is no association between two measured variables (e.g., the exposure and the outcome) or no significant difference between two studied populations other than what would be expected from sampling or experimental error, is called … ?

A

Null hypothesis

118
Q

When do you fail to reject the null hypothesis (basically accept the null hypothesis)?

A

When RR or OR are 1

vvvvvvvvvvvvvvv

If the p-value is less than the predetermined significance level (alpha )

119
Q

What does an RR of 1.08 with p-value = 0.01 mean?

A

There is a statistically significant association between the exposure and outcome.
A p-value of 0.01 means there is a 1% probability the null hypothesis is true.

120
Q

When analyzing a confidence interval, what are the two distinctions when determining to reject the null hypothesis or fail to reject the null hypothesis?

A

When the study is measuring differences, the null value is expressed with a “0,” if a “0” is within the confidence interval, fail to reject the null hypothesis, if a zero is not within the confidence interval, then reject the null hypothesis.

vvvvvvvvvvvvvvv

When the study is using ratios, like an odds ratio or relative risk, and a “1” is within the confidence interval, this is the null value and will warrant accepting the null hypothesis (failing to reject the null hypothesis). If a “1” is not within a confidence interval when a ratio is used, then reject the null hypothesis.

121
Q

If using a confidence interval, what value must the interval lack in order to fail to reject the null hypothesis?

A

If a null value of “0” is within the confidence interval, fail to reject the null hypothesis (accept the null). If the confidence interval does not include the null value, this means that the data provide sufficient evidence to reject the null hypothesis, suggesting that there is likely a true effect or difference.

122
Q

If a study lacks sufficient statistical evidence to conclude there is a real effect or difference, what might be expected in the confidence interval?

A

When the CI includes the null value, it suggests that the observed effect or difference could plausibly be zero (or no effect) given the data.

123
Q

What is the probability of rejecting the null hypothesis when it is actually true (the likelihood of concluding that there is an effect or association when there truly is none)?

A

Alpha is the probability of committing a Type I error in hypothesis testing.

124
Q

What is the significance of the probability of committing a type 1 error?

A

This is the significance level and the probability of a type I error is denoted by alpha (α).

vvvvvvvvvvvvvvv

This is expressed as the p-value.

125
Q

When the p-value is above the alpha threshold, a researcher should … ?

A

” Fail to reject the H₀ “

vvvvvvvvvvvvvvvvvv

This means accept the null hypothesis where the null hypothesis (H₀) states that there is no association between the exposure and the outcome.

126
Q

What is the usual alpha threshold when conducting studies?

A

A threshold of 0.05 is standard in many fields, indicating a 5% chance of falsely rejecting the null hypothesis. This means that there is a 5% probability of concluding there is an effect when there is none (false positive).

127
Q

What is the significance of a 95% confidence interval (CI) for relative risk (RR) or odds ratio (OR) that includes 1.0?

A

If the 95% CI includes 1.0, it means there is a > 5% chance that the observed association is due to chance. This corresponds to a p-value > 0.05, and the null hypothesis of no association cannot be rejected.

128
Q

What type of error occurs when the null hypothesis is rejected despite it actually being true. Consequently, the alternative hypothesis is accepted, although the observed effect is actually due to chance (Wrongfully concluding that there is an association between exposure and disease when in fact there is none)?

A

Type I error

vvvvvvvvvvvvvvv

“False positive”

129
Q

Falsely rejecting the null hypothesis means that a researcher committed a ____ error

A

Type I error

vvvvvvvvvvvvvvvv

This is the same as creating a false positive.

130
Q

The probability of obtaining the observed results (or more extreme results) assuming the null hypothesis is true, is how the ____ is described.

A

p-value

131
Q

What does it mean if the 95% CI for relative risk (RR) or odds ratio (OR) does not include 1.0?

A

If the 95% CI does not include 1.0, there is a < 5% chance that the observed association is due to chance. This corresponds to a p-value < 0.05, and the null hypothesis of no association is rejected.

132
Q

What is the probability of not committing a Type I error?

A

1-alpha

vvvvvvvvvvvvvvv

This represents the confidence level.

vvvvvvvvvvvvvvv

Correctly identifying a true null hypothesis (The probability of correctly failing to reject the null hypothesis when it is true).

vvvvvvvvvvvvvvv

“True negative”

133
Q

A study finds that the relative risk (RR) of an outcome for patients is 1.23, with a p-value of 0.03 and a 95% confidence interval (CI) of 1.12-1.46. What does this p-value mean in terms of RR?

A

The p-value is the probability that the result of a given statistical test will be at least as extreme as the result actually observed on repeat testing, assuming that the null hypothesis is true. Accordingly, the p-value of 0.03 in this study signifies that there is a 3% probability that the true population RR is at least as extreme as 1.23, the relative risk, assuming that the null hypothesis is true.

vvvvvvvvvvvvv

Alternatively, if there were no association risk factor and the outcome (i.e., RR = 1.0) and the study was repeated 100 times, a relative risk of *1.23 *would only be found 3 times (i.e., 3% of the repeated studies). However, this is not equivalent to stating that there is a 3% probability that the RR of 1.23 is due to chance alone.

134
Q

When the null hypothesis is false (i.e., the alternative hypothesis is true), but the null is incorrectly not rejected, what type of error has occurred (wrongfully concluding that there is no association between exposure and outcome, when in fact there is one)?

A

Type II error (β)

vvvvvvvvvvvvvvv

When the null hypothesis is false (i.e., the alternative hypothesis is true), but the null is incorrectly not rejected (accepted), a Type II error (β) has occurred. This means the test fails to detect a real association or difference, wrongfully concluding that there is no association between the exposure and the outcome, even though one exists.

vvvvvvvvvvvvvvv

“False negative”

135
Q

When are studies more prone to type II errors?

A
  1. Low statistical power (e.g., small sample size or high data variability).
  2. Weak effect size (the true difference or association is small and hard to detect).
  3. Poor study design or inappropriate statistical tests.
136
Q

In a research study out of 1000 participants, there was no difference between the means. However it was discovered that 400 measurements were inadvertently left out of the analysis. What would happen in terms of the probability of correctly rejecting the no hypothesis if these 400 measurements were incorporated into the analysis?

A

This would increase the sample size. Increasing the sample size of the study by including the 400 patients would result in an increased probability of correctly rejecting the null hypothesis when the alternative hypothesis is true (i.e., increased statistical power). Increased power also leads to a lower likelihood of falsely accepting the null hypothesis when the alternative hypothesis is true (type II error).

137
Q

the probability that a study will detect a true difference is …. ?

A

Statistical power

138
Q

The probability of correctly rejecting the null hypothesis, i.e., the ability to detect a difference between two groups when there truly is a difference, is called … ?

A

(1 - beta)

vvvvvvvvvvvvvvv

Statistical power

vvvvvvvvvvvvvvv

This is the probability of rejecting the null hypothesis when it is false (i.e., correctly detecting the true difference).

vvvvvvvvvvvvvvv

“True Positive”

139
Q

What increases statistical power?

A

Statistical power (1 - beta) positively correlates with the sample size and the magnitude of the association of interest (e.g., increasing the sample size of a study would increase its statistical power) but will inversely impact Type II error (β).

vvvvvvvvvvvvvvvvv

The higher the precision, the greater the statistical power (1 − β).

140
Q

By convention, most studies set statistical power at … ?

A

Most studies set a statistical power to 80%

vvvvvvvvvvvvvvvvv

Power primarily depends on the strength of the association (if present) and the size of the sample population. When researchers have an estimate of the strength of the association, they can perform power calculations to determine the sample size required to achieve 80% power.

141
Q

When designing a pilot study, the power is …?

A

fixed a priori (usually at a value ≥ 0.80) meaning from established knowledge.

vvvvvvvvvvvv

The study is then performed using a small number of participants (usually 10–50; the exact number depends on the type of study and type of statistical analysis) and the minimum detectable effect size is measured. Based on this information, a sample size that yields adequate power is calculated, and this sample size is used during the main study.

vvvvvvvvvvvvv

The higher the β, the lower the statistical power.

142
Q

Does statistical power affect generalizability?

A

No, statistical power does not directly affect generalizability, but it plays an important role in the reliability of study results. Statistical power refers to the probability of correctly rejecting the null hypothesis when a true association exists (i.e., avoiding a Type II error). Generalizability, on the other hand, refers to how well the findings of a study apply to populations beyond the specific sample studied.

143
Q

Missing the signal is a type_______error.

A

II

144
Q

A false alarm is a type________error.

A

I

145
Q

the probability of incorrectly rejecting the alternative hypothesis is … ?

A

Statistical power

146
Q

The probability that the result of a given statistical test will be at least as extreme as the result actually observed, assuming that the null hypothesis is correct, is called the … ?

A

p-value

147
Q

A study finds that the relative risk (RR) of an outcome for patients is 1.23, with a p-value of 0.03 and a 95% confidence interval (CI) of 1.12-1.46. What does this p-value mean in terms of the confidence interval (CI)?

A

A CI is a range of values associated with an estimate that is thought to contain the true population value with a given level of confidence. The Cl is determined by the alpha value, which is typically set at 5%, using the following formula: 100% - alpha = Cl. The 95% CI of 1.12–1.46 indicates that there is a 95% probability that the true population RR lies between 1.12 and 1.46.

vvvvvvvvvvvvvv

CIs and p-values are interrelated and either can be used to determine whether a result is statistically significant. In this study, the p-value is < 0.05, which corresponds to a 95% CI that does not include the null value (i.e., RR = 1.0), meaning that the null hypothesis can be rejected and the association is therefore considered statistically significant. Because RR is a ratio and does not follow a normal distribution, the CI is not symmetrical with the observed RR.

148
Q

What is the relationship between p-value and the confidence interval spread?

A

A p-value reflects how strongly the data contradict the null hypothesis. The narrower the confidence interval and the further it is from including the null value (1.0 for relative risk or odds ratio), the smaller the p-value tends to be.

149
Q

What does a p-value < 0.05 indicate?

A

A p-value < 0.05 indicates that the observed relationship is statistically significant and unlikely to have occurred by chance.

150
Q

What should a researcher do if the p-value is below the alpha threshold?

A

If the p-value of a statistical test is less than or equal to the alpha threshold, the null hypothesis is rejected, and the result is considered statistically significant. The null hypothesis (H₀) states that there is no association between the exposure and the outcome. Rejection of the null hypothesis means the outcome did not occur by chance.

151
Q

What method of statistical analysis pools summary data (eg, means, RRs) from multiple studies
for a more precise estimate of the size of an effect.

A

Meta-analysis