Statistics Flashcards
Which of the following problematic practices are likely to constitute lying with statistics? Mark all correct options.
a) Using accurate data and valid statistical methods to draw a false conclusion.
b) Using inaccurate data to draw a true conclusion.
c) Making a claim for which there is absolutely no statistical basis.
d) Formulating the conclusion of a statistical study using a term such as “average”, without specifying, for instance, whether the median or mean is intended.
Answer: a), d)
Which of the following claims about the choice and justification of statistical methods are true? Mark all correct options.
a) The choice of statistical methods does not matter if you know which conclusion you need to arrive at.
b) The available statistical methods will all yield the same result if applied to the same data.
c) The use of a certain statistical method is a conventional matter, and the choice of this method can be left to a statistical program.
d) The justification of statistical methods is an integral part of science.
Answer: d)
General Feedback
The choice of statistical method is an example of a scientific method choice, and it should be clear from the overall theme of the course that method choice is an important matter. Reflecting on the reasons given in the lecture for why the choice of statistical method matters, you should be able to weed out the false options.
Imagine that you are a researcher in psychology who is interested in developing a statistical test of the positive impact on one’s ability to perform well on standardized tests, by being exposed to so-called “intelligence-related words” (for example “smart”, “clever”, “genius” and so forth). To determine this, you measure people’s performance on such standardized tests before and after having been exposed to intelligence-related words. You note that 75 % of those who were subjected to intelligence-related words improved their score the second time they took the test. Is it true or false that one can conclude this result to be statistically significant solely on the basis of this information?
a) True
b) False
Answer: b)
General Feedback
Whether or not this result is statistically significant depends on the background conditions of the study. The question we must answer is: What is the probability that the result would obtain, given that exposure to intelligence related words does not improve one’s performance on standardized tests? The result is statistically significant only if this probability is sufficiently low (in particular, lower than the conventionally set significance level).
To be able to determine this probability, however, certain background conditions of the test must be determined first. “75 %” might sound impressive, but what if we have reason to expect people to perform better the second time they take the test - simply because they’ve taken it before? What if the test group consists of only four people?
In the example from the lecture, the conclusions derived from the data set of the European Central Bank allowed for two opposing conclusions; namely that Germany was the worst off in terms of household income on average, and that they were, on average, somewhere in the middle. How is such a seeming contradiction possible? Mark the correct option.
a) It is in fact not possible. People simply did not understand the data.
b) It is possible because ‘average’ is ambiguous, and two different senses were used.
c) It is possible because there is no truth of the matter.
Answer: b)
General Feedback
Think about how good methodology relates to lying with statistics. How can we ensure that our results are correct and not misleading?
Which of the following are examples of good statistical methodology? Mark all correct options.
a) Carefully choosing the statistical method that is the most suitable one to get a significant result, given the type of study you are performing.
b) Carefully choosing the statistical method that is the most suitable one to get a true result, taking into consideration the type of study you are performing.
c) Using any of the methods that are implemented in some high-quality statistics software.
Answer: b)
General Feedback
Think about how good methodology relates to lying with statistics. How can we ensure that our results are correct and not misleading?
Which of the following situations might occur depending on how we interpret ‘average’ in the sentence “this student’s test score is below the average score”? Mark all correct options.
a) If there are some low-scoring outliers (and no high-scoring outliers), then the student’s test score might be above the median score but still be below the mean score.
b) If there are some low-scoring outliers (and no high-scoring outliers), then the student’s test score might be above the mean score but still be below the median score.
c) If there are some high-scoring outliers (and no low-scoring outliers), then the student’s test score might be below the mean score but still be above the median score.
d) If there are some high-scoring outliers (and no low-scoring outliers), then the student’s test score might be below the median score but still be above the mean score
Answer: b), c)
General Feedback
The general problem is that the sentence can mean different things depending on how we interpret ‘average’: does it stand for ‘mean’ or ‘median’? Low scoring outliers will push the mean score lower while high-scoring outliers will push it higher. Consider how this would affect the relation between the mean and the median! (Also, don’t forget the example with incomes given in the lecture.)
What are some common sources of misunderstanding connected to the choice of statistical format? Mark all correct options.
a) A relative risk increase of e.g. 200% may be perceived as alarmingly large if one doesn’t understand that the initial risk was very low.
b) A relative risk reduction may be overestimated when compared to the same risk reduction in absolute numbers.
c) It may be hard to understand how absolute risk reduction is connected to the “number needed for treatment” format.
Answer: a), b), c)
General Feedback
The three formats absolute, relative and numbers needed for treatment are discussed in the lecture in connection to two examples. The first example is about risk reduction and the second is about risk increase.
Under what circumstances do we have reason to evaluate a hypothesis statistically? Mark all correct options.
a) For some hypotheses, the only relevant implications are stochastic. The truth of these implications can be determined precisely only with the help of statistical tools.
b) In order to correctly determine how much a set of observational data changes one’s confidence in a hypothesis, statistical methods are useful.
c) In order to account for measurement errors and similar disturbing influences, one might want to determine the acceptable error of a test. Quantifying this error requires statistical methods.
d) When the hypothesis is stated in terms of statistical notions, such as ‘mean’ and ‘median’.
e) We should always do so. Otherwise, the testing procedure is unscientific.
Answer: a), b), c)
- Stochastic implications of H
- Quantifying error
- Quantifying confidense
Which of the following claims about p-values are true (in the context of Fisher’s version of a significance test)? Mark all correct options.
a) P-values are compared with a significance level
b) Given a hypothesis H (with a test statistic T) and a data set D, the p-value is the probablity of H given D. That is, it is the probability that one’s hypothesis is true, given the outcome of one’s test.
c) P-values are set by convention.
d) Given a hypothesis H (with a test statistic T) and a data set D, the p-value is the probablity of D (or any more extreme outcome) given H. That is, it is the probability of getting the outcome one actually got, or a more extreme one, given that one’s hypothesis is true.
Answer: a), d)
General Feedback
- Don’t confuse p-value with significance level!
- There is a huge difference between the probability of the hypothesis H given the data set D (i.e. P(H|D)) and the probability of D given H (i.e. P(D|H)). On the first reading, the statistical setup requires that the probability ranges over some hypotheses. Something which isn’t possible in Fisher’s setup. On the second reading, the statistical setup requires that the probability ranges over certain outcomes, instead. This means that even though P(H|D) would (arguably) be more interesting to know, it’s not possible to know given the current statistical setup.
Imagine a researcher who is investigating whether changing the materials used to conduct electricity in central processing units (CPUs) positively affects its efficiency by a factor of 1.12. The researcher formulates the hypothesis “the effect of changing the materials is less than 1.12”. When failing to reject this hypothesis, the researcher decides (against good research practice) to perform p-value abuse. Which of the following alternatives constitute such abuse? Mark all correct options.
a) Expanding the sample in order to find a data set that (in isolation) has a significantly low p-value in relation to the stated hypothesis, and then choose to report only that data set.
b) Rejecting the stated hypothesis without reporting the p-value of the data.
c) Using the same data sample and, with the only reason being to generate a statistically significant result, modifying the value in the hypothesis to 1.15, which gives a significant p-value.
d) Purposely choosing a significance level that is higher than the p-value calculated for the test data.
Answer: a), c), d)
General Feedback
It helps to carefully examine the meaning of “p-value”. The p-value is the the probability to observe an outcome at least as extreme as the observed outcome. In mathematical language, P(D|H) where “P” stands for the probability, “D” stands for the set of outcomes that includes the obtained outcome and all outcomes that are less probable than the obtained outcome, and “H” stands for the (assumed to be true) hypothesis.
Neyman-Pearson’s method of hypothesis testing allows researchers to choose between two competing hypotheses. What is true about this method? Mark all correct options.
a) Rejecting one hypothesis entails the acceptance of another.
b) A type I error for a given test excludes the possibility of a type II error for that test.
c) Usually, one of the hypotheses is taken to be the main hypothesis. The other can be just about any given hypothesis, provided that it functions as an auxiliary hypothesis in some possible empirical test of the first hypothesis.
d) The hypotheses must be mutually exclusive and jointly exhaustive
Answer: a), b), d)
Suppose you have set up a Neyman-Pearson test with the hypothesis Hi: “the new advertising campaign doesn’t increase the sales of the advertised products”. What is true about the power of the test? Mark all correct options.
a) If your test is high-powered and the campaign does in fact increase the sales, then there is a high probability that Hi will be rejected by the test.
b) Collecting data from more people in the area targeted by the campaign will make your test less probable to accept Hi in situations when Hi is in fact false.
c) Collecting data from more people in the area targeted by the campaign is one way to increase the power of the test.
d) If your test is high-powered and the campaign does in fact increase the sales, then there is a high probability that Hi will be accepted by the test.
e) Collecting data from more people in the area targeted by the campaign will make your test less probable to reject Hi in situations when Hi is in fact true.
Answer: a), b), c), e)
General Feedback
What would be the alternative hypothesis Ha in the described scenario? What would be a type I error and what would be a type II error? These three terms are all related to the power of a test and can help you to identify the correct options!
In a sentence, what is the focus of Bayesian statistics? Mark the correct option.
a) Calculating the probability of the hypothesis being true, given the evidence.
b) Calculating the probability of observing the evidence, given that the hypothesis is true.
Answer: a)
What is true about the differences between Fisher’s significance testing, Neyman-Pearson hypothesis testing and Bayesian hypothesis testing? Mark all correct options.
a) Bayesian hypothesis testing can mirror the idea that humans have prior beliefs about the truth and falsity of hypotheses. This cannot be done with Fisher and Neyman-Pearson.
b) As opposed to a Neyman-Pearson test, the hypotheses in a Bayesian test need not be mutually exclusive.
c) While Fisher tests are threatened by many kinds of p-value abuse, Neyman-Pearson tests avoid most of them.
d) Neyman-Pearson testing is the only one out of the three where the test can reject a hypothesis.
e) Fisher tests and Neyman-Pearson tests include two hypotheses that are tested simultaneously, but Bayesian tests don’t.
Answer: a), c)
General Feedback
In the lecture, both Neyman-Pearson and Bayesian testing are presented as a way to avoid some shortcomings with the previous kind of testing. While this implies some differences, other parts remain the same between two or all three kinds of statistical testing.