L29 - Methodological Issues Flashcards

1
Q

What does Leamer believe people do to models to improve them?

A

Called the typology of Specification Search

  1. Hypothesis testing search
  2. Interpretative search
  3. Simplification search
  4. Proxy search
  5. Data selection search
  6. Postdata model construction

1 to 3 can be thought of as ‘general to specific’ because they start with an unrestricted model and test restricted versions.

4 to 6 are ‘specific to general’ because they involve modifying theoriginal model by introducing new or alternative variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Hypothesis testing search?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Data selection search?

A
  • splits the data to see if the model behaves differently
    • is the relationship robust between different samples and data sets
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Proxy variable search?

A
  • testing for better proxy’s
    • income might not be accurate as some people may lie about their actual income
  • Attempts to see if the relationship is robust when different proxy’s are used
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Post data model construction?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Interpretative Search?

A
  • used theory to test the model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Simplification Search?

A
  • Testing restrictions on a model in order to reduce the number of parameters, in order to improve the efficiency of the others
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the implications of the hypothesis testing, interpretative and simplification search?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the three guilty secrets of econometrics?

A
  1. Economic significance and statistical significance are not the same thing.
  2. Data mining means that reported levels of significance are often not correct.
  3. Many apparently significant relationships are really spurious regressions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How are Economic significance and Statistical significance different?

A

A test is statistically significant if we can reject the null hypothesis at a given level of significance.

A result is economically significant if it has an important influence on economic behaviour.

The two above definitions of significance are not the same thing.

(in the second case the elasticity is less accurate as it has a higher SE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Data mining?

A
  • Data mining refers to the process of searching a data set for correlations between variables.
  • The problem is that we cannot then use the same data set to test the significance of relationships we identify.
  • For example, suppose we regress a variable of interest on 100 different explanatory variables. Even if ALL of these are unrelated to the variable of interest we would expect to find 5 significant relationships. –> at the 5% level
  • If we wish to test a relationship detected through data -mining then we need to do so on a new data set.

chosen the best possible results from all 10 regression run –> even though 8 signify a insignificant relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is cherry-picking?

A
  • one potential problem with ‘big data’ is that they comb through data running hundreds of regression till they get a significant result they want
    • this okay if you go on to test if the relationships are robust but by itself is likely to lead to a poor model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are spurious regressions?

A

e.g. can arrive a spurious regression by cherry-picking the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the most common reason for spurious regressions?

A
  • the present of unit roots in data series (a series that contains a random walk element)
    • As the data is not stationary a lot of statistical results do not apply to this series
  • even though they are independent random walks there is high level of correlation between the data
    • You find this out usually from having a highly significant variable, a high R-squared and very low DW statistic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does the Monte Carlo Simulation highlight the issue of spurious regression?

A
  • 67% of the time there was a significant t-ratio
    • not saying you should not regress one unit root process on another - it can be useful,
    • but you need to be aware that you shouldn’t that the statistics at face value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What happens to a regression of a random walk if it has a drift problem?

A
17
Q

What does modelling require other than statistical techniques?

A
  • In relation to spurious regression
  • The appearance of serial correlation is produced by the presence of the structural break.
  • Before the structural break, the residuals are mainly negative. After it, the residuals are mostly positive –> We would see correlation in the errors
  • ‘Correcting’ for serial correlation may appear to deal with the problem but is disguising the real problem.
    • If we don’t allow for the possibility of a structural break, you may end up correcting for the wrong problem (serial correlation)
  • A better solution is to deal with the structural break by including a dummy variable in the regression.