Probability & Statistics Flashcards
In plain words, what are descriptive statistics and what do you lose by using them?
Descriptive statistics are the numbers and calculations we use to summarize raw data, which always implies some loss of nuance or detail.
What do inferential statistics allow us to do?
We can use data from the “known world” to make informed inferences about the “unknown world.
What does the scientific method dictate about the variables when testing a hypothesis?
That the variable of interest should be the only thing that differs between the experimental group and the control group.
What is the main problem with the mean (average)?
It is sensitive to outliers.
When will the mean and median be similar?
When the distribution has no serious outliers.
What are the benefits of the median and quartiles/percentiles/etc?
They describe where a particular observation lies compared with everyone else.
What is the difference between an absolute figure and a relative figure?
Absolute figures can usually be interpreted without any context or additional information. A “relative” value or figure has meaning only in comparison to something else.
In informal terms, what is the standard deviation?
It is a measure of how dispersed the data are from their mean.
What is the key difference between the distribution of weights of airplane passengers and the same from professional marathon runners when both have the same mean?
The weights of the two groups have roughly the same “middle,” but the airline passengers have far more dispersion around that midpoint, meaning that their weights are spread farther from the midpoint.
What do we know about the proportion of observations in a normal distribution?
We know (by definition) exactly what proportion of the observations in a normal distribution lie within one standard deviation of the mean (68.2 percent), within two standard deviations of the mean (95.4 percent), within three standard deviations (99.7 percent), and so on.
What is a simple explanation for an index?
It is a descriptive statistic made up of other descriptive statistics.
What is a histogram?
A histogram is an approximate representation of the distribution of numerical data that uses bins to divide the range of values.
What are the advantages of using a curve to represent the distribution of data over a histogram?
1) The curve allows you to estimate the probability of a value that wasn’t observed.
2) The curve is not limited by the width of the bins.
3) If we have limited resources, the approximate curve (based on the mean & st.dev of the data we collected) is usually good enough.
Why is the distribution of the height of male babies much narrower than the one of male adults?
Because there are more possible measurements for adult males’ heights. The more options there are for height, the less likely any specific measurement will be one of them.
What is needed to draw a normal distribution?
The mean (to center the distribution) and the standard deviation (to give it width).
What is a random variable?
Random variables are ways to map outcomes of random processes to numbers. Basically, you are quantifying the outcomes.
Example: Y = sum of upward face after rolling seven dice.
How are random variables different from traditional variables?
With traditional variables, you can solve for them or assign them values.
With random variables it makes more sense to talk about the probability of an outcome.
What is the difference between discrete and continuous RVs?
Discrete: they take distinct, separate values. They are countable.
Continuous: they can take any value in an interval.