EPIB 621 Midterm Flashcards by Kate Harvey

Define “randomness”

–A process is random if the individual outcome of the process is not known in advance, but a regular distribution of the outcomes would be observed if the process were repeated many times
–Randomness is due to lack of knowledge, if we knew “everything” we’d be able to predict everything!

How well did you know this?

Not at all

Perfectly

Explain the difference between statistical and deterministic relationships.

Statistical: contains randomness/error. Deterministic: does not contain randomness/error.

How well did you know this?

Not at all

Perfectly

Explain the meaning of a sampling distribution.

Represents the most likely values of the statistic over all possible samples drawn from the population.

How well did you know this?

Not at all

Perfectly

Define the population mean (mu).

Mu is the weighted average of all possible Y values, weighted by their corresponding
probabilities according to the assumed distribution

How well did you know this?

Not at all

Perfectly

Why is standard deviation used instead of variance?

The standard deviation has the same scale as the data, and hence is comparable to deviations from the mean.

How well did you know this?

Not at all

Perfectly

Label mean, median, and mode in the diagram.

(add image)

How well did you know this?

Not at all

Perfectly

In a right-skewed distribution, mean > median.

How well did you know this?

Not at all

Perfectly

In a left-skewed distribution, mean > median.

How well did you know this?

Not at all

Perfectly

In a right-skewed distribution, mean < median.

How well did you know this?

Not at all

Perfectly

In a left-skewed distribution, mean < median.

How well did you know this?

Not at all

Perfectly

What is the mean of a Bernoulli distribution?

How well did you know this?

Not at all

Perfectly

What is the variance of a Bernoulli distribution?

pi(1-pi)

How well did you know this?

Not at all

Perfectly

What is the mean of a binomial distribution?

n*pi

How well did you know this?

Not at all

Perfectly

What is the variance of a binomial distribution?

n*pi(1-pi)

How well did you know this?

Not at all

Perfectly

What is the mean of a poisson distribution?

lambda

How well did you know this?

Not at all

Perfectly

What is the variance of a poisson distribution?

lambda

How well did you know this?

Not at all

Perfectly

What is the mean of a normal distribution?

How well did you know this?

Not at all

Perfectly

What is the variance of a normal distribution?

sigma^2

How well did you know this?

Not at all

Perfectly

Under what conditions can poisson be used to approximate a binomial distribution?

Large n, small p –> Y ~ Poisson(np)

How well did you know this?

Not at all

Perfectly

The distribution is determined by the sample mean and standard deviation.

False. The distribution is determined from the assumed population mean and standard deviation (i.e., the distribution under H0).

How well did you know this?

Not at all

Perfectly

A Type I error results from falsely rejecting H0 when H0 is true.

How well did you know this?

Not at all

Perfectly

A Type II error results from falsely rejecting H0 when H0 is true.

How well did you know this?

Not at all

Perfectly

A Type I error is a false positive.

How well did you know this?

Not at all

Perfectly

A Type II error is a false positive.

How well did you know this?

Not at all

Perfectly

A Type I error is a false negative.

A Type II error is a false negative.

What is the meaning of alpha?

The probability of making a Type I error (falsely rejecting H0 when H0 is true).

What is the meaning of beta?

The probability of making a Type II error (failing to reject H0 when H0 is false).

What is the meaning of statistical power?

The area under the alternative distribution that does not overlap with the null (1 minus beta).

Increasing alpha will increase the power of a test.

Decreasing alpha will increase the power of a test.

A confidence interval is a random quantity

95% of the time, the population mean will fall within the CI.

False. Instead, when sampling is repeated many times, 95% of CIs will cover the true value of mu.

What are the main differences between CIs and p-values?

CIs reflect the effect size and quantiffy uncertainty. P-values are more flexible; can be used for one-sided tests.

The regression line represents the expected value of the covariate.

False. X is treated as a fixed value and cannot be ascertained from Y.

The model of simple linear regression is valid only if X and Y are continuous.

False. X can be either categorical/binary or continuous (but Y must be continuous).

In simple linear regression, Y musts be normally distributed.

False. Only the errors (epsilon) must be normally distributed. (Y is only normally distributed for a fixed value of X, but not overall.)

The regression line represents the expected value of the covariate.

False. X is treated as a fixed value and cannot be ascertained from Y.

The model of simple linear regression is valid only if X and Y are continuous.

False. X can be either categorical/binary or continuous (but Y must be continuous).

In simple linear regression, Y musts be normally distributed.

False. Only the errors (epsilon) must be normally distributed. (Y is only normally distributed for a fixed value of X, but not overall.)

Which of the following are random vs. fixed: Y, beta0, beta1, X, epsilon (error).

Random, fixed, fixed, fixed, random

What is the variance of X?

Var(X) = 0 since it is not a random quantity

Var(epsilon) is equal to Var(Y).

True. Since the variance of all other quantities in the equation are equal to 0, Var(Y) = Var(beta_0 + beta_1*X + epsilon) = Var(epsilon) = sigma^2

The variance of Y depends on the value of X.

False. This is one of the underlying assumptions of linear regression.

Which of the following is (are) correct: (1) E[Y] = beta_0 + beta_1*X; (2) E[Y | X] = beta_0 + beta_1*X + epsilon; (3) Y = beta_0 + beta_1*X + epsilon

Correct, incorrect, correct

Which of the following is (are) correct: (1) Y = beta_0 + beta_1*X + epsilon; (2) E[Y | X] = beta_0 + beta_1*X; (3) E[Y] = beta_0 + beta_1*X + epsilon

Correct, correct, incorrect

What are the assumptions required for linear regression?

1 – errors are independent; 2 – errors are normally distributed; 3 – Var(Y) is independent of X; 4 – relationship between X and Y is linear

Explain maximum likelihood estimation.

The sample is used to calculate the distribution that has the highest probability of giving rise to the sample obtained.

The standard error of a predicted value of a single individual is larger than the standard error of the prediction mean.

The variance of a predicted value of a single individual is equal to the variance of the prediction mean.

The prediction interval is shorter than the confidence interval of the prediction mean

False (the opposite is true).

The error follows a N(0,𝜎^2) distribution

𝛽0 can be interpretated as the expected value of 𝑌𝑖 given the value 𝑋𝑖 = 0

We assume 𝑌𝑖 follows a normal distribution without the need to consider 𝑋𝑖

False. Yi follows a normal distribution *given* a fixed value of X.

What is the interpretation for regression coefficient 𝛽i of Xi?

𝛽i is the expected (mean) change in Yi for a 1-unit increase in Xi, holding all else constant.

The number of parameters is equal to the number of predictor variables in multiple regression.

False. The number of parameters can be larger than the number of predictors (eg., x and x^2 terms will result in two additional parameters but only a single predictor).

The p-value for 𝛽2 > 0.05 means there is no evidence to reject the hypothesis that the expected increased value of 𝑌 is 0 with one unit increase of 𝑋2 (holding 𝑋1 constant), under the significance level of 0.05.

𝛽1(hat) is the same as the estimation of 𝛽1 in 𝑌 = 𝛽0 + 𝛽1𝑋1

False. Needs to include error term in equation since Y is random and beta_0/beta_1/X1 are not.

𝛽1 is the expected value of 𝑌 holding 𝑋2 = 0

False. Should be holding X2 constant, not 0.

Which group should be chosen as the reference?

Generally, the group with the largest sample size, since it will have the highest precision.

How many dummy variables need to be created?

n – 1

𝑋1 can be constructed as 𝑋1 = 1 when primary language is English and 𝑋1 = 0 when the primary language is French

False – "other" would have no coding in this case.

𝛽0 is the baseline alcohol intake for the reference group

True. This is always the case since dummy variables beta_1, beta_2, etc. will be 0

In a model of the relationship between walking distance and heart rate, modified by mood, how many variables are there?

3 – one independent, one dependent, one effect modifier

When we investigate "How would the walking distance associated with heart rate change across different moods?" Which variable is the modifier?

Mood

How to report the effect of a discrete modifier?

Report effect of exposure for each group separately

How to report the effect of a continuous modifier?

Report effect of exposure for a fixed value of covariate(s) in terms of beta_1 (eg. for a given age, the effect of dose is beta_1 + beta_2*age)

The main effects must be included in a model with interaction.

False (however there should be strong prior knowledge/rationale for exlcuding the main effects if this is the case).

Chitto's heart rate increases with walking distance. The amount of increase depends on his mood. Confounding or EM?

When Chitto is happy, he is more likely to walk farther and have a higher heart rate. Confounding or EM?

Confounding

Stratification is useful when...

Confounding variable is categorical with only a few categories. (CANNOT diagnose confounding!)

Which of the following should be included in a regression model: confounder, mediator, variable with collinearity

Confounder only

Which of the following will increase R^2: adding a polynomial term, confounder, collinear random variable, interaction term.

All except a collinear random variable (will usually break regression). In general, adding ANY variable to a regression will increase the R^2.

EPIB 621 Midterm Flashcards

(73 cards)