TENTAPLUGG Flashcards
What are properties of good estimators?
- It is unbiased/validity. The expected value of the sample parameter is equal to the population parameter.
- It is reliable/efficient. The estimator has an as small variance as possible. Generally decreases when sample size increases
Which starting points for CI and hypothesis testing are approximate because of CLT (single population)?
Case 3 and Case 4
What are the assumptions for Case 1 (single pop.)?
- Xi follows a normal distribution
- Sigma2 (pop. variance) is known.
What are the assumptions for Case 2 (single pop.)?
- Xi follows a normal distribution
- Sigma2 (pop. variance) is UNknown
What are the assumptions for Case 3 (single pop.)?
- Xi follows an unknown distribution
- Sigma2 is unknown
- n >= 30
What are the assumptions for Case 4 (single pop.)?
- n >= 40
What are the assumptions for Case 5 (single pop.)?
- Xi follows a normal distribution
- mew and Sigma2 are unknown
Which starting points for CI and hypotheses testing are approximate (double populations)?
- Case 2 (round down v)
- Case 3 and Case 5, from CLT
What are the assumptions for Case 1 (double pop.)?
- Independent samples
- Xi, Yj follows normal distributions
- mewX, mewY, Sigma2X, Sigma2Y are unknown, but Sigma2X=Sigma2Y–> pooled variance S2p
What are the assumptions for Case 2 (double pop.)?
- Independent samples
- Xi, Yj follow normal distributions
- mewX, mewY, Sigma2X, Sigma2Y are unknown, Sigma2X & Sigma2Y are UNEQUAL
What are the assumptions for Case 3 (double pop.)?
- Independent samples
- Xi, Yj follow unknown distributions
- mews and sigma2s are unknown (may or may not be equal)
- nx, ny >= 30
What are the assumptions for Case 4 (double pop.)?
- DEPENDENT samples (matched pair)
- Xi, Yi follows normal distributions, –> Di follows normal distribution
- mews and sigma2s are unknown
What are the assumptions for Case 5 (double pop.)?
- Independent samples
- nx, ny >=40
What are the assumptions for Case 6 (double pop.)?
- Independent samples
- Xi, Yj follows normal distributions
- mews and sigma2s are unknown
What is special about the CI for sigma2x/sigma2y (Case 6)?
The CI in the formula sheet is for sigma2Y/sigma2X, so switch order of x- and y-terms.
Fa;b;0,95 = 1 / Fb;a;0,05
What are the four essential concepts of hypotheses testing?
- A null hypothesis H0, and an alternative hypothesis H1
- A significance level of the test, alpha
- A test statistic
- A decision rule
What are the four possible scenarios of hypothesis testing?
- P(Accept H0 I H0 is true) = 1-alpha
- P(Accept H0 I H1 is true) = beta (type 2 error/false negative)
- P(Reject H0 I H0 is true) = alpha (type 1 error/false positive)
- P(Reject H0 I H1 is true) = 1-beta
What is the power of a test (hypothesis testing)?
The ability to correctly reject H0 when it is false.
P(Reject H0 I H1 is true) = 1-beta
What is the p-value (hypothesis testing)?
The smallest significance level alpha, at which H0 can be rejected.
General rule: Reject H0 if p-value < given alpha
How can we test for correlations?
With the assumption that we have a random sample from a joint normal distribution, we can use Pearson’s test for correlations.
What are the assumptions for the SLR model?
- Linearity
- The Xi values are fixed/known
- E(error term i) = 0. V(error term i) =Sigma2(error term)
- E(error term i ; error term j) = 0 (i not equal to j)
- Error term i follows a normal distribution
What is the residual?
e(i) = y(i) - y-hat(i).
Deviation between estimated value and observed value.
What is S2(error term)?
An estimator of the error term.
What parts does ANOVA consist of?
- SST: sum of squares total. variation in dependent variable.
- SSR: sum of squares regression. model variation
- SSE: sum of squares error. residual variation
- R2-value: how many percent of the variation in the data that are explained by the model. R2>0,5
FOR MLR:
- Adjusted R2-value. Protects against overfitting.
What is the optimal predictor for both individual and aggregate/average predictions (SLR), and how do we find CIs?
y-hat(n+1) = b0 + b1*x(n+1)
individual: y-hat(n+1) +- t(alpha/2)(n-2) * s(e) * s(y-hat, n+1)
Aggregate: y-hat(n+1) +- t(alpha/2)(n-2) * s(e) * s(E(……))
What are the assumptions of the MLR model?
- Linearity
- The x1i, … , xki values are fixed/known
- E(error term i) = 0. V(error term i) =Sigma2(error term)
- E(error term i ; error term j) = 0 (i not equal to j)
- The independent variables (x) are not perfectly related (no multicollinearity problem)
FOR CI and HT
6. Error term i follows a normal distribution
What is multicollinearity and how do we detect it?
Independent variables are perfectly related.
Signs of problem:
- R2 values are close to 1 but non/few of the variables are significant
- Spurious/wrong signs of coefficients
Detect:
- Sample correlation higher than ex 0,8?
- Variance Inflation Factor, VIF > 10?
- Tolerance factor: 1/VIF < 0,1?