How to read numbers Flashcards
Bayes theorem
The key mantra underlying bayes theorem is that new evidence should not completely determine your beliefs in a vaccum. It should update prior beliefs. An example is what Danial kayman used which was if you have a description of a person being meek and a tidy soul. He is shy and withdrawn. He has a need for order and structure and a passion for detail. Is he likely a librarian or a farmer?
Well you should first establish a likely sample. For every 10 librarians there are probably 200 farmers. Now lets say the description makes it more likely a librarian than a farmer. So of the 10 librarians likely 40% or 4 fit that description. Of the farmers lets say 10% fit the description so you get 20 farmers. Now if you picked a librarian, it means of the 24 people in total who fit the description you went with the one with a probability of 4/24 or 1/6 or 16%. Pick ing the farmer was the right choise. Bayes theorem is all about the conditional probability.
Simpsons paradox
Simpson’s Paradox is a statistical phenomenon where an association between two variables in a population emerges, disappears or reverses when the population is divided into subpopulations.
For instance, two variables may be positively associated in a population, but be independent or even negatively associated in all subpopulations. Cases exhibiting the paradox are unproblematic from the perspective of mathematics and probability theory, but nevertheless strike many people as surprising. Additionally, the paradox has implications for a range of areas that rely on probabilities, including decision theory, causal inference, and evolutionary biology.
A good example:
Wisconsin has repeatedly had a higher overall 8th grade standardised test scores than texas. So you might think wisconsin is doing a better job.
However when broken down by race - which via entrenched socioeconomic differences is a major factor in standardised test scores - Texas students performed better than wisconsin students on all fronts. bLACK Texas students did better than black wisconsin and likewise with hispanic and white. The difference in the overall ranking is because wisconsin has proportionally fewer black and hispanic students and proportionally more white students (who tend to do better at the tests). So the takeaway shouldn’t be that wisconsin has better education but rather more socioeconomically advantaged people.
Understanding the underlying causal context in statistics can have a huge public implication.
Example in book:
For instance, black people in the USA are more likely to smoke than white people; but when you control for education, you see that in every educational subgroup, black people are less likely to smoke. It’s just that a lower proportion of black people are in the higher educational subgroups, which tend to smoke less.
The trouble is that, in Simpson’s-paradox situations, you can use the same data to tell diametrically opposed stories, depending on what political point you want to make. The honest thing to do is to explain that the paradox is present.
Simpson’s paradox is one example of a wider problem, known as ‘the ecological fallacy’, that you get when you try to learn about individuals or subgroups by looking at the average of a group.
What is R rate
R is the reproductive number of something. It can apply to anything that spreads or reproduces – internet memes, humans, yawns, new technologies. In infectious-disease epidemiology, it’s how many people, on average, will be infected by a single person with the disease. If a disease has an R of five, on average each infected person will infect five other people.
In unevenly distributed situations do we prefer mean or median?
In unevenly distributed situations like this, statisticians often prefer to use the median. If we do that, we line up our people from left to right again, and the person in the middle is still the person earning £4. In a real population of millions of people, this will tell you more about what the population is like than the mean will, especially if the mean is distorted by a few ultra-high-earners at the upper end of the income distribution.
Variance
We can describe how much the data varies like this with a measure called the variance.
Sampling Bias
Well, there are other ways in which studies can be wrong. The most obvious is that, often, the sample you’ve taken isn’t representative of the population you’ve taken it from.
you’d set up your people-measuring stall outside a basketball players’ convention. Suddenly you might find that you were seeing a lot of seven-footers strolling past. The average height of your sample jumps up, but the average height of the population is unchanged. This is called sampling bias.
Biased samples are pernicious in a way that small samples aren’t. At least with small but random samples, the more data you get, the closer you’ll get to the true answer. But with biased samples, getting more data doesn’t help and instead can make you more confident in your wrong answer.
The trouble is that Twitter is not representative of the population. The 17 per cent5 of the UK population who use Twitter tend (according to a 2017 study)6 to be younger, more female and more middle-class than the country as a whole. Younger people, women and the middle classes are more likely to vote Labour than the country as a whole.
wouldn’t help to have asked more people on Twitter. You’d still have the same problem, since you’re still polling a non-representative sample. If you polled a million people on Twitter, it would still be polling the population of Twitter, not of the country – you would simply get ever more precise results around the wrong answer.
Whats another way a sample can be biased?
There are other ways in which samples can be biased; the most obvious is leading questions. For instance, if you ask people whether we should give medicine to 600 people, their answers will depend on whether you say ‘200 people will be saved’ or ‘400 people will die’, even though those statements are logically identical.
How do we know that’s not just a fluke? How do we know that they did better because of some real difference, not just random variation?
To find out, we could use a statistical technique called significance testing (or hypothesis testing).
First, we imagine the results we’d expect to see if the book had no effect whatsoever. This is called the ‘null hypothesis’. The other possibility – that the book does have some positive effect – is called your ‘alternative hypothesis’.
What you need to do now is see how likely those results (or more extreme results) would be if the null hypothesis was true – that is, in our example, if reading the book has no effect, and any variation is just randomness. That’s significance testing.
in theory, even the most dramatic results could be a total fluke. But the bigger the difference, the more unlikely that fluke is. Scientists measure the chances of coincidence with something called the probability value, or ‘p-value’.
Statistical significance’ is a measure of how likely you are to see something by fluke, not of how important it is.
A third, crucial, point is that it doesn’t mean that if you get a finding of p=0.05 there’s only a one in twenty chance that your hypothesis is false. That misunderstanding is common and is a big part of why science goes wrong.
But that blog post raised red flags with scientists. Behaviour like this is known as ‘p-hacking’, massaging the data to get your p-value to a publishable below-0.05 figure. Methodologically savvy researchers started to go through all Wansink’s old work, and a source leaked his emails to Stephanie M. Lee, an investigative science journalist at BuzzFeed News.
Academics desperate to get p<0.05, so they can get their paper published, will rerun a trial, or reanalyse the data. You might have heard of the ‘replication crisis’, in which lots of important findings in psychology and other disciplines have turned out not to exist when other scientists tried to replicate their findings.
There’s no easy way for readers to spot this in news stories. But it’s worth being aware that just because something is ‘statistically significant’, it doesn’t mean that it’s actually significant, or even that it’s true.
It found a statistically significant result – p<0.01, which you’ll remember from Chapter 5 means that if there was no real effect at all, and you ran the study 100 times, you’d expect to see a result as extreme as this less than once.
If a finding is statistically significant, that just means that there’s a good chance that it’s real. What is the other thing you need to consider?
The other thing you need to consider is the effect size. Conveniently, unlike ‘statistical significance’, ‘effect size’ means exactly what it says on the tin: the size of the effect.
Correlation issue examples
The proportion of deaths linked to obesity worldwide by year correlates with the amount of carbon dioxide emitted by year.3 So does carbon dioxide make people fat? Probably not. Instead, what’s likely been going on is that the world is getting richer, and as people get richer, they have more money to spend on both high-calorie foods and on carbon-emitting goods like cars and electricity.
If you take that into account, the link between carbon emissions and obesity will probably disappear. The third variable, GDP, accounts for most of the link between one and the other.
Another classic example is ice cream and drownings. On days when ice cream sales go up, so do drownings. But obviously ice cream doesn’t make people drown. Instead, ice cream sales go up on hot days, because ice cream is nice on a hot day; and so is swimming, which unfortunately leads to some people drowning.
But finding out whether this effect is real, or whether it’s really caused by some other variable – some ‘confounder’ – is difficult.
But there’s a general rule: when you see a news story saying that X is linked to Y, don’t necessarily assume that means that X causes Y, or even vice versa. There could be some hidden thing, Z, that causes both.
Regression line
You could draw it by eye, and you’d probably do a pretty good job. But there is a mathematically more precise way of doing it called the ‘least squares method’.
Imagine you draw a straight line on the graph. It’ll touch some dots, but most of them will be above or below the line. The vertical distance of each dot from the line is called the ‘error’, or the ‘residual’. Take the value for each residual, square it (that is, multiply it by itself – that removes the problem that some values will be negative, because a number multiplied by itself is always positive),
then add them all together. That figure is the ‘sum of squared residuals’. Whatever line you can draw that has the lowest possible sum of squared residuals is known as the ‘line of best fit’.