Trust Economics Week 9 Flashcards by lew lee

This topic explores how much we should trust empirical results published in journals.

How well did you know this?

Not at all

Perfectly

FACTS: replicability vs non-replicability
Replicability exists in sciences why we have law of physics etc.

Non-replicability in economics as studying animal behaviour (no heterogeneity when responding to stimuli)

How well did you know this?

Not at all

Perfectly

What is the standard economic method of testing

Hypothesis testing i.e start with null.

E.g drug doesn’t cure cancer. (No relationship on cancer)

Reject null means drug does cure cancer. (There is a relationship)

How well did you know this?

Not at all

Perfectly

P value

How unlikely the pattern in your data would have arisen, if null is true. (E.g drug does not cure cancer, but how surprised am I to see data that shows it does?) (level of confidence we can reject the null, smaller=better)

(Small p value=greater evidence null hypothesis is false, so drug does cure cancer)

How well did you know this?

Not at all

Perfectly

2 problems with research practices harming credibility/trustworthiness of results

Publication bias

P-hacking

How well did you know this?

Not at all

Perfectly

Publication bias

Publication likelihood greater with a smaller p-value
I.e smaller p-value = ones that have a significant effect, and reject the null.

How well did you know this?

Not at all

Perfectly

P-hacking

Researchers make methodological choices in conducting their studies that tend to deliver lower P-values i.e significant effects. (Again no-one wants boring new with no relationship)

Remember P-hacking can still be accurate, just bias.

How well did you know this?

Not at all

Perfectly

What is publication bias an example of?

A selection effect

How well did you know this?

Not at all

Perfectly

Selection effect and example

What we see in newspapers/journals is not everything, but a filtered subset. Selection is not usually random, so needs to be considered!

e.g the WW2 Study by RAF of returning bombers and their pattern of bullet holes. TEACHES YOU TO ACKNOWLEDGE HOW SELECTION HAPPENED, AND THE SUBSEQUENT REASON FOR RESULTS

How well did you know this?

Not at all

Perfectly

Why does publication bias arise

People want to see X is related Y, not X isn’t related to Y!

E.g people wanna see coke makes you bald, not coke doesn’t make you bald.

How well did you know this?

Not at all

Perfectly

Opposed to nature of science, how does nature of statistics work? And examples

Enough studies enough times can create a purely spurious result

Jelly bean colour example, coke example.

Unlike where you apply heat to water, it always becomes steam.

How well did you know this?

Not at all

Perfectly

Coke example explained

Questions men about their consumption for 15 drinks. Then whether bald or not.

Most drinks had nothing to do with hair loss, represented by blue dots.

Journalist only writes up on red dot (significant result)
No causation, but a positive correlation between coke o drinkers and being bald. (Maybe cos older men drinking more fizzy drinks!)

How well did you know this?

Not at all

Perfectly

Examples of p-hacking decisions the researcher can make (3)

What data to collect
What sample
Defining variables

How well did you know this?

Not at all

Perfectly

Example of p-hacking in hotel reviews

Hotel only request feedback for online review if they know their customers enjoyed their stay, hence why reviews are generally good on tripadvisor

How well did you know this?

Not at all

Perfectly

Results can be probabilistic rather than deterministic- example

E.g giving employee of month might not motivate every worker, but a change in effort across a large sample/portion

How well did you know this?

Not at all

Perfectly

We need to consider scale of these problems.

Stylised example without publication bias or p-hacking

Consider 1000 hypotheses, 50 true. We don’t know which. So experiment done on all 1000.

Experiment finds significant result if actually true=0.8 (“Power” of test)
Experiment finds significant result if actually false=0.05 (arbitrarily p-value cut-off)
Seeing a positive result, the chance of underlying hypothesis being true is

(0.8 x 50) / {(0.8 x 50) + (0.05 x 950)}= 0.46

Meaning of this

If we see a paper at a 5% significant that the result is true, the probabitly of actually being true is 0.46

LESS THAN HALF!

This example shows the extent of the problem, REMEMBER THIS EXPERIMENT HASN’T EVEN HAD PUBLICATION BIAS OR P-HACKING, SO IN THE REAL WORLD EFFECT EVEN WORSE

So what should we consider more (2)

Theory along with results to support the findings

High powered studies, more compelling with a bigger sample.

Marathon analogy

We would assume a small amount of people at the beginning, a lot in middle range and few at ending- a normal distribution.

In reality there are multiple spikes; people push to get 2:59 rather than 3:01 etc. so these cause the spikes with more people finishing at these times

Brodeur et al

Collected 13440 p-values to see their z-statistics distribution when unmanipulated!

Spikes around critical values (like the marathon example).

Show a major filtration process, meaning we don’t see certain results! (Ones where fail to reject i.e the boring ones)

So first way to collect evidence is to collect P-values and see the distribution (marathon and Brodeur).

Second way to see more evidence, and example

Try to replicate studies

If we collect a bunch of studies with p<0.05, this means more than 95% should replicate!

Nosek- replicated 100 research findings, only 39 could be reproduced. (39 IS NOT 95!!!)

So, 2 ways to show evidence and increase accuracy

Collect p-values and see their distribution

Replicate studies

Solutions

Change publication practices to allow null results to be more easily published.

Encourage replications via data sharing, journals publishing replications, university recognition etc.

Make replications easier e.g via relaxing data-sharing requirements: “open science” movement

Require “pre-registration” or “pre-analysis plans” to address p-hacking

Change publication practices to allow null results to be more easily published. Example

Journal of Development Economics receive experiments without results. So reviewer and editor only decide to publish on the basis of the quality of the methodology, rather than the answer!

Require “pre-registration” or “pre-analysis plans” to address p-hacking

Make experimenters state their choices and state what they’re going to do, before collecting their data. So reviewers know how these results arised prior.