ANOVA and Regression Flashcards

Question

What does the trace function do?

Answer 1

It is the sum of the diagonals of a square matrix

Answer 2

E[q]=u'Au+tr(AV)

Answer 3

Show BVA=0

Answer 4

Show A1VA2=0

Answer 5

Chi-squared with rank(A) AV idempotent

Answer 6

Centered regression (x_i-xbar) helps to reduce the ill effect caused by high correlations among the columns (covariates) this is collinearity and det(X'X)=0 so therefore (X'X)^-1 does not exist. “collinearity” means a “near-linear” relationship (high correlation coefficient) among covariate

Answer 7

1) the numerator is distributed normally 2) the denominator is distributed chi-square 3) the numerator is independent from the denominator

Answer 8

Divide the alpha level by m where m is number of confidence intervals for which simultaneous coverage is desired

Answer 9

The assumption that a linear model is actually a good fit for the data. Can be checked by inspecting scatter plots for a linear relationship between that variable and the response as well as by inspecting residual plots for patterns

Answer 10

The assumption that there is not structure to the residuals (no pattern). The Runs test is a nonparametric method to determine structure in the residuals by counting the number of sequences of points above or below the mean/median residual. The Durbin-Watson test is applicable if the data can be arranged in time order. The test has a table and tests if correlation = 0.

Answer 11

The assumption of constant variance for the residuals. This can be tested using a scatter plot, a residual plot, or using the BF test (also called Levene's for groups) and the BP test for general constant variance

Answer 12

Normality of error can be tested by plotting the residuals using a box-plot, histogram, or normal probability plot. This can also be tested formally using the Shapiro-Wilks test, Kolmogorov-Smirnov test, or the Anderson-Darling test. Note however that normal probability plots provide no information if the assumptions of linearity or homoskedasticity have been violated

Answer 13

One that simultaneously has a large absolute residual and high leverage

Answer 14

Leverage is the effect of that point on the regression and the leverage of the ith point can be found via element hii of the hat matrix

Answer 15

Cook's distance measures influence. It depends on two factors- leverage and size of the residual. There are three situations which can cause high influence: high residual+moderate leverage, high leverage+moderate residual, or high both. There is a large Cook's distance if D_i \> F_{alpha, p, n-p}

Answer 16

If nonlinearity but homogeneity of variance: change model Linearity but heterogeneity of variance: WLS or transform Heterogeneity of variance and nonlinearity: Apply a transformation If right skew data with heterogenity and nonlinearity: log transformation If count data: square root transformation If proportions: arcsin square root transformation

Answer 17

Bootstrap. Procedure: 1) Take a sample of n from dataset with replacement 2) Compute the statistic of interest on that dataset (usually using mean to compute parameter [x(x'x)^-1x']) 3) Repeat N times and order the N results 4) Depending on alpha, find percentiles and compute

Answer 18

If a higher orrder term is included that isn't in the true relationship then the result is a higher prediciton variance by the unbiased estimators. If a higher order term which should be there is not included then the estimators are no longer unbiased

Answer 19

Gauss-Markov Theorem

Answer 20

SSR(A,B)-SSR(B) SSE(B)-SSE(A,B)

Answer 21

H_RH_F=H_FH_R=H_R

Answer 22

They are plots of the two sets of residuals e_i(Y|X_k) and e_i(Y|X_m). If there is a nice linear relationship in the added variable plot, one should add X_z into the model

Answer 23

1. Forward selection (start with the null model and add the best variable individually) 2. Backward selection (start with the full model and subtract the worst variable individually) 3. Stepwise selection (start with the null model and add/subtract to maximize the desired measurement criteria) Measurement criteria: Adj R², Mallow's Cp, AIC/BIC

Answer 24

Collinearity is a "near-linear" relationship (a high correlation coefficient) among covariates. It increases the variance of the estimators.

Answer 25

1) Not an estimate of any population quantity unless the data are multivariate normal 2) Can be dramatically changed by how the x's are selected 3) Does not capture nonlinear relationships, only linear ones 4) Non-decreasing in the number of predictors. Adding an extra predictor will not cause R² to decrease

Answer 26

Go with the smallest

Answer 27

All joint probabilities equal the product of their marginal probabilities (p_ij=p_.jp_i. for all i,j)

Answer 28

Difference in porportions, relative risk, odds

Answer 29

Occurs when the data are incorrectly grouped together without the relevant factor. It calls for a higher dimensional table to truly address the problem

Answer 30

GLMs extend ordinary regression models to encompass nonnormal response distributions and modeling function of the mean

Answer 31

H₀: There is only one regression line (B₂=0) H₁: There are two regression lines with different intercepts (B₂≠0) Test statistic is distributed t_n+m-3

Answer 32

A rectangular table having I rows for X categories and J columns for Y categories. The cells contain frequency counts of outcomes for a sample. The IxJ table is also called a cross classification table.

ANOVA and Regression Flashcards

(116 cards)