L29 - Methodological Issues Flashcards
What does Leamer believe people do to models to improve them?
Called the typology of Specification Search
- Hypothesis testing search
- Interpretative search
- Simplification search
- Proxy search
- Data selection search
- Postdata model construction
1 to 3 can be thought of as ‘general to specific’ because they start with an unrestricted model and test restricted versions.
4 to 6 are ‘specific to general’ because they involve modifying theoriginal model by introducing new or alternative variables
What is Hypothesis testing search?
What is Data selection search?
- splits the data to see if the model behaves differently
- is the relationship robust between different samples and data sets
What is Proxy variable search?
- testing for better proxy’s
- income might not be accurate as some people may lie about their actual income
- Attempts to see if the relationship is robust when different proxy’s are used
What is Post data model construction?
What is Interpretative Search?
- used theory to test the model
What is Simplification Search?
- Testing restrictions on a model in order to reduce the number of parameters, in order to improve the efficiency of the others
What are the implications of the hypothesis testing, interpretative and simplification search?
What are the three guilty secrets of econometrics?
- Economic significance and statistical significance are not the same thing.
- Data mining means that reported levels of significance are often not correct.
- Many apparently significant relationships are really spurious regressions
How are Economic significance and Statistical significance different?
A test is statistically significant if we can reject the null hypothesis at a given level of significance.
A result is economically significant if it has an important influence on economic behaviour.
The two above definitions of significance are not the same thing.
(in the second case the elasticity is less accurate as it has a higher SE)
What is Data mining?
- Data mining refers to the process of searching a data set for correlations between variables.
- The problem is that we cannot then use the same data set to test the significance of relationships we identify.
- For example, suppose we regress a variable of interest on 100 different explanatory variables. Even if ALL of these are unrelated to the variable of interest we would expect to find 5 significant relationships. –> at the 5% level
- If we wish to test a relationship detected through data -mining then we need to do so on a new data set.
chosen the best possible results from all 10 regression run –> even though 8 signify a insignificant relationship
What is cherry-picking?
- one potential problem with ‘big data’ is that they comb through data running hundreds of regression till they get a significant result they want
- this okay if you go on to test if the relationships are robust but by itself is likely to lead to a poor model
what are spurious regressions?
e.g. can arrive a spurious regression by cherry-picking the data
What is the most common reason for spurious regressions?
- the present of unit roots in data series (a series that contains a random walk element)
- As the data is not stationary a lot of statistical results do not apply to this series
- even though they are independent random walks there is high level of correlation between the data
- You find this out usually from having a highly significant variable, a high R-squared and very low DW statistic
How does the Monte Carlo Simulation highlight the issue of spurious regression?
- 67% of the time there was a significant t-ratio
- not saying you should not regress one unit root process on another - it can be useful,
- but you need to be aware that you shouldn’t that the statistics at face value