Test Flashcards

1
Q

Pearson coefficient of correlation

A

It measures the strength of the linear relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Multicollinearity

A

Is a condition that occurs when two or more independent variables are highly correlated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

R^2

A

Measure the percentage of variation in the dependent variable that is explained by the set of all independent variables in the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Nested model

A

Two models are said to be nested if one contains all the variables of the other model plus at least one extra variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Mallows cp

A

-Popular model selection criteria

-mallows cp is related to adjusted r2 but imposed a penalty for increasing the number of independent variables

  • it is called a parsimonious decision criterion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Press stats

A

Press is based on the leave one out or jacknife technique in which one fits the model without the ith observation xi and uses this fitted model to predict the response when x = xi. The press residuals are defined as e = yi - y hat. The process is repeated for all n observation.

  • the lower the value of press the better the predictive model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Predicted r squared

A

It indicates how well a regression model predicts responses for new observation. This statistics helps you determine when the model fits the original data but is less capable of providing valid predictions for new observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Vif

A

Test for multicollinearity, It quantifies the degree to which the variance of the estimated regression coefficient is increased due to collinearity among the predictor variables.

  • if bigger than 5 the model probably has a problem with multicollonearity

-if all vif are less than 1/(1-r2) then multicollinearity is not strong enough ti affect the coefficient estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Heteroscedasticity

A

Occurs when regression results produce error terms that are of significant’y varying degrees across settings of the independent variables

-variance might be larger has the independent variables gets larger

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to stabilize Heteroscedasticity?

A

Ln(y)
Square root y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Test for Heteroscedasticity

A

Divide the sample observation based on the values of y hat or equivalent’y in this example the value of x

We next calculate the variance of the observation in subgroups 1 and 2 and perform a test of hypothesis for the ratio of the variances

F= variance larger/ variance smaller

We look in the F table if test> than table we reject equal variance

Df = number of variable in each variance (larger, smaller)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Anderson-darling

A

Test for normality

H0= distribution is normal
H1 distribution is not normal

If ad test > 0.05 no reason to conclude that dist not normal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Standardized residuals

A

The standardized residual denoted z for the ith observation is the residual for the observation e divided by the standard error of the estimate s

If an observation is greater than standardized residual of 3 it is considered an outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cooks D

A

Cooks d is an overall measure of the impact of the ith observation on the n fitted values. Observation with large d values may be outliers. Because d is calculated using leverage values and standardized residuals, it considers whether an observation is unusual with respect to both x and y values

Calculated percentile is:
- between 0 and 0.30 conclude not influential

  • between 0.3 and 0.5 conclude midly influential

-greater than 0.5 conclude influential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Durbin watson

A

It is used for time series data to detect serial correlation of residuals

-highly positively correlated d=o

-uncorrelated = 2

-negatively correlated =4

-lower tail test:
H0: no residual correlation
H1: positive correlation
•it has to be smaller than Dlower to show evidence of positive correlation

-upper tail test:

-h0: no residual corr
-h1 negative correlation
•rejection region: (4-D) < d lower shows evidence of negative correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What contains deseasonalized data

A

TxCxI
So we have to divide Y/S

17
Q

Pacf

A

The partial correlation between two variables is the amount of correlation between those variables which is nit explained by their mutual correlations with a given set of other variables

18
Q

AIC and BIC

A

Criteria including a parsimony factor select model having minimum aic and bic

Note: r is the total number of parameters including the constant term

They can be use as techniques for variable selection in regression analysis

19
Q

Ljung box q stats

A

Is a test for overall model adequacy it is a sort of lack of fit test.

It belong to a class of test known as portmanteau test

Instead of studying the correlation coefficient rk one at a time the idea is to consider a whole set of rk values for example r1 through r12 all at one time

It test to see if the entire set is significantly different from the zero set

If p value smaller then 0.05 the model is considered inadequate

20
Q

Bonferroni correction

A

Is an adjustment made to alpha values when several stats test are being performed simultaneously

We need to divide the alpha by the number of comparisons made

This test is made to reduce the chances of obtaining false positive results (type 1 error). The probability of identifying at least one significant result due to chance increases as more hypotheses are tested

21
Q

Detecting unequal variances

A

Hartley test: test stats = maximum variance/ minimum variance, reject ho for large values of the ts

Bartlett’s test: follows a chi-squared distribution with p-1 df

Modified levene’s test: similar to a 1 way anova based on the absolute deviation of the observations of each sample from their medians

If population is normal use bartlett test

If population is not normal use levene’s test

22
Q

What is autocorrelation

A

Correlation of a series with its own previous values