Week 4 Flashcards

1
Q

Hypothesis testing and Type 1 error.

A

Hypothesis testing can be considered a function of Type 1 error (α). We reject H0 when our test statistic is outside the critical values or our p-value is less than α.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Confidence intervals and Type 1 error.

A

A confidence interval is also a function of the Type 1 error (α). A (1 − α) 100% confidence interval is calculated as [point estimate ± critical value * SE].

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Relationship between hypothesis tests and confidence intervals.

A

We reject H0 if and only if the corresponding confidence interval (with confidence level = 1 - α) excludes μ0, the value of the mean under H0. If we fail to reject H0, the corresponding confidence interval includes μ0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Visualizing rejecting H0.

A

Both confidence interval and hypothesis testing approaches can be used to visualize rejecting H0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Visualizing failing to reject H0.

A

Both confidence interval and hypothesis testing approaches can be used to visualize failing to reject H0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Algebraic demonstration of the equivalence between hypothesis testing and confidence intervals.

A

When we fail to reject H0, the confidence interval for contains μ0. The algebraic demonstration shows how the test statistic relates to the confidence interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Confidence interval: advantages.

A

Confidence intervals report a range of plausible values for µ. They are useful for determining practicality (scientific importance) and provide a sense of variability through their width.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

P-value: advantages and disadvantages.

A

A p-value gives a sense of how extreme or significant the data is. However, it doesn’t give insight into scientific importance or the variability in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

One sample t-test example.

A

A one-sample t-test was used to determine if light-smoking twins meet the NCI definition of smoking 5 pack years on average. The null hypothesis is H0: µpackyears1 = 5, and the alternative is Ha: µpackyears1 ≠ 5, with a significance level of 0.05.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

One sample t-test example: Results and Interpretation

A

The 95% confidence interval for the mean pack years is (1.414, 7.303). Because this interval includes the value 5, we fail to reject the null hypothesis that the light smoking twins meet the NCI definition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Power formula for a two-sided test.

A

To compute power, one needs to specify the mean of the alternative hypothesis (µ1) and the null hypothesis (µ0). The power formula is provided for a two-sided test, assuming normal distributions with variance σ2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Research question for the smoking and bone density study.

A

The research question is whether the Bone Mass Density (BMD) of the lumbar spine is different between heavier vs. lighter smoking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Study design for the smoking and bone density study.

A

The study uses data from 41 pairs of middle-aged female twins, with one twin being a heavier smoker and the other a lighter smoker.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Key variables in the smoking and bone density study.

A

The heavier smoking twin is identified as twin 2. The study focuses on the differences in lumbar spine BMD (ls2-ls1) between the twins.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Independence in the twins study.

A

Twin pairs are independent of each other, but individuals within a pair are not independent. The differences in BMD between the twins are independent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Appropriate test for the smoking and bone density study.

A

A one-sample t-test of the differences (paired t-test) is the most appropriate test to determine if there is a significant difference in BMD between the twins.

17
Q

Creating the difference variable in Stata.

A

The difference variable ‘lumbdendiff’ is generated by subtracting the lumbar spine BMD of the lighter smoking twin from the heavier smoking twin (ls2-ls1).

18
Q

Assumptions of the t-test.

A

Before using the t-test, one must assess the assumptions of normality of the differences and independence.

19
Q

Normality assessments.

A

Normality of the differences can be assessed using a histogram, boxplot, Q-Q plot, and the Shapiro-Wilk test. In this case, all assessments indicate that the data is reasonably normal.

20
Q

Independence assumption

A

Each twin pair is independent of the other pairs, but the twins within each pair are not independent.

21
Q

Performing the t-test in Stata.

A

The t-test is performed using the command ‘ttest lumbdendiff=0, level(95)’. This test helps determine if there’s a significant difference in BMD between the twins.

22
Q

Interpretation of the t-test results.

A

There is a significant difference in BMD of the lumbar spine between heavier and lighter smoking twins, with the heavier smoking twins having lower BMD. The null hypothesis is rejected.

23
Q

Does the p-value indicate the amount of variation in a sample?

A

No, the p-value does not tell us about the amount of variation in our sample.

24
Q

Confidence intervals and p-values.

A

Confidence intervals and p-values provide different information. P-values make it easy to use any level but give no insight into variation, while confidence intervals give a sense of variation through width.

25
Q

Confidence interval in Stata.

A

A 95% confidence interval is calculated using the command ‘ttest lumbdendiff=0, level(95)’ in Stata.

26
Q

Interpretation of confidence intervals in the twins study.

A

Because the upper bound of the 95% CI is less than 0, we can be fairly confident that the true mean difference is less than 0.

27
Q

Comparison of Hypothesis Test and CI

A

Both the hypothesis test and confidence interval methods led to the rejection of the null hypothesis. The hypothesis test provides an exact p-value, while the confidence interval method allows one to see the influence of variance in the data.

28
Q

Key steps in data analysis.

A

The key steps include exploring the data using the data dictionary, assessing data properties (normality, independence), and selecting an appropriate test.

29
Q

Use of regression models.

A

Typically with observational data, regression models are used to adjust for other covariates.

30
Q

Example of poorly behaved data.

A

An example is given where the histogram and boxplot of pyr1 demonstrate data that is not well-behaved.