EPIB 621 Midterm Flashcards

1
Q

Define “randomness”

A

–A process is random if the individual outcome of the process is not known in advance, but a regular distribution of the outcomes would be observed if the process were repeated many times
–Randomness is due to lack of knowledge, if we knew “everything” we’d be able to predict everything!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain the difference between statistical and deterministic relationships.

A

Statistical: contains randomness/error. Deterministic: does not contain randomness/error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the meaning of a sampling distribution.

A

Represents the most likely values of the statistic over all possible samples drawn from the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define the population mean (mu).

A

Mu is the weighted average of all possible Y values, weighted by their corresponding
probabilities according to the assumed distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is standard deviation used instead of variance?

A

The standard deviation has the same scale as the data, and hence is comparable to deviations from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Label mean, median, and mode in the diagram.

A

(add image)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In a right-skewed distribution, mean > median.

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In a left-skewed distribution, mean > median.

A

F

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In a right-skewed distribution, mean < median.

A

F

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In a left-skewed distribution, mean < median.

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the mean of a Bernoulli distribution?

A

pi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the variance of a Bernoulli distribution?

A

pi(1-pi)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the mean of a binomial distribution?

A

n*pi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the variance of a binomial distribution?

A

n*pi(1-pi)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the mean of a poisson distribution?

A

lambda

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the variance of a poisson distribution?

A

lambda

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the mean of a normal distribution?

A

mu

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the variance of a normal distribution?

A

sigma^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Under what conditions can poisson be used to approximate a binomial distribution?

A

Large n, small p –> Y ~ Poisson(np)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The distribution is determined by the sample mean and standard deviation.

A

False. The distribution is determined from the assumed population mean and standard deviation (i.e., the distribution under H0).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

A Type I error results from falsely rejecting H0 when H0 is true.

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

A Type II error results from falsely rejecting H0 when H0 is true.

A

F

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

A Type I error is a false positive.

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A Type II error is a false positive.

A

F

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

A Type I error is a false negative.

A

F

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

A Type II error is a false negative.

A

T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the meaning of alpha?

A

The probability of making a Type I error (falsely rejecting H0 when H0 is true).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the meaning of beta?

A

The probability of making a Type II error (failing to reject H0 when H0 is false).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the meaning of statistical power?

A

The area under the alternative distribution that does not overlap with the null (1 minus beta).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Increasing alpha will increase the power of a test.

31
Q

Decreasing alpha will increase the power of a test.

32
Q

A confidence interval is a random quantity

33
Q

95% of the time, the population mean will fall within the CI.

A

False. Instead, when sampling is repeated many times, 95% of CIs will cover the true value of mu.

34
Q

What are the main differences between CIs and p-values?

A

CIs reflect the effect size and quantiffy uncertainty. P-values are more flexible; can be used for one-sided tests.

35
Q

The regression line represents the expected value of the covariate.

A

False. X is treated as a fixed value and cannot be ascertained from Y.

36
Q

The model of simple linear regression is valid only if X and Y are continuous.

A

False. X can be either categorical/binary or continuous (but Y must be continuous).

37
Q

In simple linear regression, Y musts be normally distributed.

A

False. Only the errors (epsilon) must be normally distributed. (Y is only normally distributed for a fixed value of X, but not overall.)

38
Q

The regression line represents the expected value of the covariate.

A

False. X is treated as a fixed value and cannot be ascertained from Y.

39
Q

The model of simple linear regression is valid only if X and Y are continuous.

A

False. X can be either categorical/binary or continuous (but Y must be continuous).

40
Q

In simple linear regression, Y musts be normally distributed.

A

False. Only the errors (epsilon) must be normally distributed. (Y is only normally distributed for a fixed value of X, but not overall.)

41
Q

Which of the following are random vs. fixed: Y, beta0, beta1, X, epsilon (error).

A

Random, fixed, fixed, fixed, random

42
Q

What is the variance of X?

A

Var(X) = 0 since it is not a random quantity

43
Q

Var(epsilon) is equal to Var(Y).

A

True. Since the variance of all other quantities in the equation are equal to 0, Var(Y) = Var(beta_0 + beta_1*X + epsilon) = Var(epsilon) = sigma^2

44
Q

The variance of Y depends on the value of X.

A

False. This is one of the underlying assumptions of linear regression.

45
Q

Which of the following is (are) correct:
(1) E[Y] = beta_0 + beta_1X; (2) E[Y | X] = beta_0 + beta_1X + epsilon; (3) Y = beta_0 + beta_1*X + epsilon

A

Correct, incorrect, correct

46
Q

Which of the following is (are) correct:
(1) Y = beta_0 + beta_1X + epsilon; (2) E[Y | X] = beta_0 + beta_1X; (3) E[Y] = beta_0 + beta_1*X + epsilon

A

Correct, correct, incorrect

47
Q

What are the assumptions required for linear regression?

A

1 – errors are independent; 2 – errors are normally distributed; 3 – Var(Y) is independent of X; 4 – relationship between X and Y is linear

48
Q

Explain maximum likelihood estimation.

A

The sample is used to calculate the distribution that has the highest probability of giving rise to the sample obtained.

49
Q

The standard error of a predicted value of a single individual is larger than the standard error of the prediction mean.

50
Q

The variance of a predicted value of a single individual is equal to the variance of the prediction mean.

51
Q

The prediction interval is shorter than the confidence interval of the prediction mean

A

False (the opposite is true).

52
Q

The error follows a N(0,𝜎^2) distribution

53
Q

𝛽0 can be interpretated as the expected value of 𝑌𝑖 given the value 𝑋𝑖 = 0

54
Q

We assume 𝑌𝑖 follows a normal distribution without the need to consider 𝑋𝑖

A

False. Yi follows a normal distribution given a fixed value of X.

55
Q

What is the interpretation for regression coefficient 𝛽i of Xi?

A

𝛽i is the expected (mean) change in Yi for a 1-unit increase in Xi, holding all else constant.

56
Q

The number of parameters is equal to the number of predictor variables in multiple regression.

A

False. The number of parameters can be larger than the number of predictors (eg., x and x^2 terms will result in two additional parameters but only a single predictor).

57
Q

The p-value for 𝛽2 > 0.05 means there is no evidence to reject the hypothesis that the expected increased value of 𝑌 is 0 with one unit increase of 𝑋2 (holding 𝑋1 constant), under the significance level of 0.05.

58
Q

𝛽1(hat) is the same as the estimation of 𝛽1 in 𝑌 = 𝛽0 + 𝛽1𝑋1

A

False. Needs to include error term in equation since Y is random and beta_0/beta_1/X1 are not.

59
Q

𝛽1 is the expected value of 𝑌 holding 𝑋2 = 0

A

False. Should be holding X2 constant, not 0.

60
Q

Which group should be chosen as the reference?

A

Generally, the group with the largest sample size, since it will have the highest precision.

61
Q

How many dummy variables need to be created?

62
Q

𝑋1 can be constructed as 𝑋1 = 1 when primary language is English
and 𝑋1 = 0 when the primary language is French

A

False – “other” would have no coding in this case.

63
Q

𝛽0 is the baseline alcohol intake for the reference group

A

True. This is always the case since dummy variables beta_1, beta_2, etc. will be 0

64
Q

In a model of the relationship between walking distance and heart rate, modified by mood, how many variables are there?

A

3 – one independent, one dependent, one effect modifier

65
Q

When we investigate “How would the walking distance associated with heart rate change across different moods?” Which variable is the modifier?

66
Q

How to report the effect of a discrete modifier?

A

Report effect of exposure for each group separately

67
Q

How to report the effect of a continuous modifier?

A

Report effect of exposure for a fixed value of covariate(s) in terms of beta_1 (eg. for a given age, the effect of dose is beta_1 + beta_2*age)

68
Q

The main effects must be included in a model with interaction.

A

False (however there should be strong prior knowledge/rationale for exlcuding the main effects if this is the case).

69
Q

Chitto’s heart rate increases with walking distance. The amount of increase depends on his mood. Confounding or EM?

70
Q

When Chitto is happy, he is more likely to walk farther and have a higher heart rate. Confounding or EM?

A

Confounding

71
Q

Stratification is useful when…

A

Confounding variable is categorical with only a few categories. (CANNOT diagnose confounding!)

72
Q

Which of the following should be included in a regression model: confounder, mediator, variable with collinearity

A

Confounder only

73
Q

Which of the following will increase R^2: adding a polynomial term, confounder, collinear random variable, interaction term.

A

All except a collinear random variable (will usually break regression). In general, adding ANY variable to a regression will increase the R^2.