Seminar 1 and 2 Flashcards

1
Q

What are the types of data sets?

A
  • Cross section data: contains observations for multiple subjects at one point in time
  • Time-series data: contains observations for one subject at different times
  • Longitudinal data (panel data): is a combination of the previous two, containing observations for multiple subjects at different times.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the types of variables?

A

There are 2 types of variables:

  1. Quantitative
    -discrete (finite numbers): 1,2,3 etc
    -continous: can take any real value within an interval: 0-14; 14-31 etc
  2. Qualitative
    -nominal: nationality, gender etc
    -ordinal: variables have a speciffic order (job rank for example)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When working with files in Eviews:

How should you structure the date based on the data type?

A

If you work with cross sections: unstructured/undated

If you work with time series: dated

If you work with panel data: undated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the coefficient of variation used for?

A

The coefficient of variation is used to determine if a distribution is heterogenous or homogenous

coeff = st dev/mean

If<30: homogenous (not represent)

If >30: heterogenous (representative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the median?

A

The median splits the distribution into two equal parts (it is the 50th percentile)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does skewness represent?

A

Skewness measures the symmetry of a distribution:

There are 3 types of skewness:

  1. Positive skew (>0)
    * tail goes to the right
    * mode is to the left
    * mean is to the right
  2. Symmetrical (0)
  3. Negative skew (<0)
    * tail goes to the left
    * mode is to the right
    * mean is to the left
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the mode?

A

The mode shows the most frequent value in the distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does kurtosis represent?

A

Kurtosis measures the peakness/flatness of the distribution.

There are 3 types of kurtosis:

  1. Platykurtic (low peakness) (<3)
  2. Normal (mezokurtic) (=3)
  3. Leptokurtic (high peakness) (>3)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the characteristics of the normal distribution?

The bell curve

A

The bell curve has a skewness of 0 (symmetrical) and a kurtosis of 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you create a logaritmic variable in Eviews?

A

series (name of new variable)=log(old variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does the logaritmic function affect a distribution?

A
  • A logaritmic function smoothens the distribution
  • It makes the distribution look closer to the Gauss Laplace curve
  • It removes outliers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the range of a graphical representation?

A

The range is defined as:

maximum - minimum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the estimation strategies when running a regression model?

A
  1. The speciffic to general approach
  2. The general to speciffic approach
  3. Keep it as general as possible
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do we estimate a regression model when using the speciffic to general approach?

A

By using the ommited variable test we include variables that are statistically significant.

NULL: variable is not significant
ALTERNATIVE: variable is significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do we estimate a regression model when using the general to speciffic approach?

A

By using the redundant variable test we exclude/drop redundant varaibles from the model.

NULL: variable is redundant
ALTERNATIVE: variable is not redundant (is significant)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the selection critera to discriminate between models?

A
  1. Maximization criteria
    - maximize R^2 (increases with more variables)
    - maximize R^2 adjusted (adds a penalty to account for R^2 problem)
    - maximixe F statistic (significance of model)
  2. Minimization criteria
    - minimize AIC (decreases with more variables)
    - minimize SIC (adds a penalty to account for AIC problem)
    - minimize HQIC

If R^2 is too high, the model has some critical issues. A good model has an adjusted R^2 between 0.3 - 0.6/0.65

17
Q

When do we accept/reject the NULL hypothesis?

A
  • If P>5%, we ACCEPT the NULL
  • If P<5%, we REJECT the NULL
18
Q

What is F-statistic?

A

F-statistic shows the overall significance

NULL: all betas are 0
ALTERNATIVE: at least one beta is different from 0 (there is significance)

19
Q

What are residuals?

A

Residuals are the difference between actual data (data that we have in our database), and the predicted values of a model (through OLS)

Residuals=actual-predicted

20
Q

What are the kinds of residuals?

A
  • Positive residuals: in this case the OLS regression underpredicts the dependent variable
  • Negative residuals: in this case the OLS regression overpredicts the dependent variable
  • Residuals = 0: in this case we have a perfect prediction (unlikely)
21
Q

What is OLS?

A

OLS is an estimation method, by which we try to estimate a linear trend, by minimising the distance between actual and predicted valuues.

Resulting in an OLS regression

22
Q

What are the assumptions of the OLS model?

A
  1. The linearity of the model
  2. Observations must be independent from each other (random data from population)
  3. Residuals must be independent
  4. Perfect or near multicollinearity should not exist
  5. Homoskedasticity needs to be present in the model
  6. Error terms should be approximately normally distributed

When all 6 assumptions are met the OLS estimators are considered BLUE (Best - Linear - Unbiased - Estimator)

23
Q

How can we verify the 1st assumption:

The model should be linear

A
  • We can check the appearance of the scatter plot
  • We can run the Ramsey-Reset test

NULL: model is linear
ALTERNATIVE: model is not linear (we need to change the functional form of the model)

This can be done by making a log function, or raising to the power of 2 etc

24
Q

How can we verify the 2nd assumption:

The observations must be independent from each other

A
  • We compare the individual sample with the common sample
25
Q

How can we verify the 3rd assumption:

Residuals should be independent

A
  • Make a scatterplot of residual VS fitted values

If the regression line is horizontal, there is no relationship

26
Q

How can we verify the 4th assumption?

Perfect or near multicollinearity should not exist

A

Before estimating the model: use correlation matrix
After estimating the model: use variance inflation factors test

If we have coefficients higher than 5, we have multicollinearity

27
Q

How can we verify the 5th assumption?

Homoskedasticity needs to be present in the model

A
  • Make a scatter plot of residuals against fitted values (if pattern looks like a cone, variance is not constant)
  • Run the BP/White test (Heteroskedasticity test)

NULL: residuals are homoskedastic
ALTERNATIVE: residuals are heteroskedastic

28
Q

What is homoskedasticity?

A

Homoskedasticity reffers to the situation in which we have a constant variance of the residuals

(homoskedasticity may be accepted in the case of large datasets, with more than 30 observations)

29
Q

What is heteroskedasticity?

A

Heteroskedasticity reffers to the situation in which the variance of residuals is not constant.

This affects the standard errors and p-values of the model.

30
Q

What is the correction for heteroskedasticity?

A

If residuals are heteroskedastic, we can apply the HAC correction.

31
Q

How can we verify the 5th assumption?

Erorr terms should be approximately normally distributed.

A
  • Make a histogram of the residuals
  • Run the Jaque-Bera test

NULL: residuals follow the normal distribution
ALTERNATIVE: residuals do not follow the normal distribution

32
Q

What are the particularities of a BLUE estimator?

A
  • When we have a blue estimator we can say that the beta and alpha of our estimation is close to the beta and alpha of the population level.
  • If the first 4 assumptions are met, the model is unbiased
  • If all 6 assumptions are met, we can rely on F-statistics