EPIB 621 Midterm Flashcards
Define “randomness”
–A process is random if the individual outcome of the process is not known in advance, but a regular distribution of the outcomes would be observed if the process were repeated many times
–Randomness is due to lack of knowledge, if we knew “everything” we’d be able to predict everything!
Explain the difference between statistical and deterministic relationships.
Statistical: contains randomness/error. Deterministic: does not contain randomness/error.
Explain the meaning of a sampling distribution.
Represents the most likely values of the statistic over all possible samples drawn from the population.
Define the population mean (mu).
Mu is the weighted average of all possible Y values, weighted by their corresponding
probabilities according to the assumed distribution
Why is standard deviation used instead of variance?
The standard deviation has the same scale as the data, and hence is comparable to deviations from the mean.
Label mean, median, and mode in the diagram.
(add image)
In a right-skewed distribution, mean > median.
T
In a left-skewed distribution, mean > median.
F
In a right-skewed distribution, mean < median.
F
In a left-skewed distribution, mean < median.
T
What is the mean of a Bernoulli distribution?
pi
What is the variance of a Bernoulli distribution?
pi(1-pi)
What is the mean of a binomial distribution?
n*pi
What is the variance of a binomial distribution?
n*pi(1-pi)
What is the mean of a poisson distribution?
lambda
What is the variance of a poisson distribution?
lambda
What is the mean of a normal distribution?
mu
What is the variance of a normal distribution?
sigma^2
Under what conditions can poisson be used to approximate a binomial distribution?
Large n, small p –> Y ~ Poisson(np)
The distribution is determined by the sample mean and standard deviation.
False. The distribution is determined from the assumed population mean and standard deviation (i.e., the distribution under H0).
A Type I error results from falsely rejecting H0 when H0 is true.
T
A Type II error results from falsely rejecting H0 when H0 is true.
F
A Type I error is a false positive.
T
A Type II error is a false positive.
F
A Type I error is a false negative.
F
A Type II error is a false negative.
T
What is the meaning of alpha?
The probability of making a Type I error (falsely rejecting H0 when H0 is true).
What is the meaning of beta?
The probability of making a Type II error (failing to reject H0 when H0 is false).
What is the meaning of statistical power?
The area under the alternative distribution that does not overlap with the null (1 minus beta).
Increasing alpha will increase the power of a test.
T
Decreasing alpha will increase the power of a test.
F
A confidence interval is a random quantity
T
95% of the time, the population mean will fall within the CI.
False. Instead, when sampling is repeated many times, 95% of CIs will cover the true value of mu.
What are the main differences between CIs and p-values?
CIs reflect the effect size and quantiffy uncertainty. P-values are more flexible; can be used for one-sided tests.
The regression line represents the expected value of the covariate.
False. X is treated as a fixed value and cannot be ascertained from Y.
The model of simple linear regression is valid only if X and Y are continuous.
False. X can be either categorical/binary or continuous (but Y must be continuous).
In simple linear regression, Y musts be normally distributed.
False. Only the errors (epsilon) must be normally distributed. (Y is only normally distributed for a fixed value of X, but not overall.)
The regression line represents the expected value of the covariate.
False. X is treated as a fixed value and cannot be ascertained from Y.
The model of simple linear regression is valid only if X and Y are continuous.
False. X can be either categorical/binary or continuous (but Y must be continuous).
In simple linear regression, Y musts be normally distributed.
False. Only the errors (epsilon) must be normally distributed. (Y is only normally distributed for a fixed value of X, but not overall.)
Which of the following are random vs. fixed: Y, beta0, beta1, X, epsilon (error).
Random, fixed, fixed, fixed, random
What is the variance of X?
Var(X) = 0 since it is not a random quantity
Var(epsilon) is equal to Var(Y).
True. Since the variance of all other quantities in the equation are equal to 0, Var(Y) = Var(beta_0 + beta_1*X + epsilon) = Var(epsilon) = sigma^2
The variance of Y depends on the value of X.
False. This is one of the underlying assumptions of linear regression.
Which of the following is (are) correct:
(1) E[Y] = beta_0 + beta_1X; (2) E[Y | X] = beta_0 + beta_1X + epsilon; (3) Y = beta_0 + beta_1*X + epsilon
Correct, incorrect, correct
Which of the following is (are) correct:
(1) Y = beta_0 + beta_1X + epsilon; (2) E[Y | X] = beta_0 + beta_1X; (3) E[Y] = beta_0 + beta_1*X + epsilon
Correct, correct, incorrect
What are the assumptions required for linear regression?
1 – errors are independent; 2 – errors are normally distributed; 3 – Var(Y) is independent of X; 4 – relationship between X and Y is linear
Explain maximum likelihood estimation.
The sample is used to calculate the distribution that has the highest probability of giving rise to the sample obtained.
The standard error of a predicted value of a single individual is larger than the standard error of the prediction mean.
T
The variance of a predicted value of a single individual is equal to the variance of the prediction mean.
F
The prediction interval is shorter than the confidence interval of the prediction mean
False (the opposite is true).
The error follows a N(0,𝜎^2) distribution
T
𝛽0 can be interpretated as the expected value of 𝑌𝑖 given the value 𝑋𝑖 = 0
T
We assume 𝑌𝑖 follows a normal distribution without the need to consider 𝑋𝑖
False. Yi follows a normal distribution given a fixed value of X.
What is the interpretation for regression coefficient 𝛽i of Xi?
𝛽i is the expected (mean) change in Yi for a 1-unit increase in Xi, holding all else constant.
The number of parameters is equal to the number of predictor variables in multiple regression.
False. The number of parameters can be larger than the number of predictors (eg., x and x^2 terms will result in two additional parameters but only a single predictor).
The p-value for 𝛽2 > 0.05 means there is no evidence to reject the hypothesis that the expected increased value of 𝑌 is 0 with one unit increase of 𝑋2 (holding 𝑋1 constant), under the significance level of 0.05.
T
𝛽1(hat) is the same as the estimation of 𝛽1 in 𝑌 = 𝛽0 + 𝛽1𝑋1
False. Needs to include error term in equation since Y is random and beta_0/beta_1/X1 are not.
𝛽1 is the expected value of 𝑌 holding 𝑋2 = 0
False. Should be holding X2 constant, not 0.
Which group should be chosen as the reference?
Generally, the group with the largest sample size, since it will have the highest precision.
How many dummy variables need to be created?
n – 1
𝑋1 can be constructed as 𝑋1 = 1 when primary language is English
and 𝑋1 = 0 when the primary language is French
False – “other” would have no coding in this case.
𝛽0 is the baseline alcohol intake for the reference group
True. This is always the case since dummy variables beta_1, beta_2, etc. will be 0
In a model of the relationship between walking distance and heart rate, modified by mood, how many variables are there?
3 – one independent, one dependent, one effect modifier
When we investigate “How would the walking distance associated with heart rate change across different moods?” Which variable is the modifier?
Mood
How to report the effect of a discrete modifier?
Report effect of exposure for each group separately
How to report the effect of a continuous modifier?
Report effect of exposure for a fixed value of covariate(s) in terms of beta_1 (eg. for a given age, the effect of dose is beta_1 + beta_2*age)
The main effects must be included in a model with interaction.
False (however there should be strong prior knowledge/rationale for exlcuding the main effects if this is the case).
Chitto’s heart rate increases with walking distance. The amount of increase depends on his mood. Confounding or EM?
EM
When Chitto is happy, he is more likely to walk farther and have a higher heart rate. Confounding or EM?
Confounding
Stratification is useful when…
Confounding variable is categorical with only a few categories. (CANNOT diagnose confounding!)
Which of the following should be included in a regression model: confounder, mediator, variable with collinearity
Confounder only
Which of the following will increase R^2: adding a polynomial term, confounder, collinear random variable, interaction term.
All except a collinear random variable (will usually break regression). In general, adding ANY variable to a regression will increase the R^2.