Chapter 3: Probability Flashcards
What are meant by random variables?
In probability theory, we describe the behaviour of random variables. This is a statistical term for variables that associate different numeric values with each of the possible outcomes of some random process.
What is meat by the term random in random variable?
By random here we do not mean the colloquial use of this term to mean something that is entirely unpredictable. A random process is simply a process whose outcome cannot be perfectly known ahead of time (it may nonetheless be quite predictable).
Imagine that we enter a lottery, where we select a number from 1 to 100, to have a chance of winning $1000. We suppose that in the lottery only one ball is drawn and it is fair, meaning that all numbers are equally likely to win.
Describe what this function would look like
A discrete probability distribution since the variable we measure – the winning number – is confined to a finite set of values. It would therefore look like a set of 100 bars of equal height and width since all numbers are equally likely to win.
Compare the function of the probability of drawing the lottery number with one depicting the probability of: Before test driving a second-hand car, we are uncertain about its value. From seeing pictures of the car, we might think that it is worth anywhere from $2000 to $4000, with all values being equally likely.
SInce the range of possible values are continuous (kinda), The graph would depict the probability density instead and it would be one square box from 2000 to 4000 with the height being a probability of 1/2000.
The aforementioned cases are both examples of valid probability distributions. So what are their defining properties?
o All values of the distribution must be real and non-negative.
o The sum (for discrete random variables) or integral (for continuous random variables) across all possible values of the random variable must be 1.
How is this satisfied in the discrete lottery case?
E^100 i = 1 1/100 = 1
i.e the sum of 100 1/100s = 1
How is this satisfied for the continuous case of the second-hand car example?
All values of the distribution must be real and non-negative: The graph indicates that p(v) = 1/2000 ≥ 0 for 2000 ≤ v ≤ 4000
integral (for continuous random variables) across all possible values of the random variable must be 1: Fortunately, since integration is essentially just working out an area underneath a curve, we can calculate the integral by appealing to the geometry of the graph. Since this is just a rectangular shape, we calculate the integral by multiplying the base by its height:
area = 1/2000 x 2000 = 1
It may seem that this definition is arbitrary or, perhaps, well-trodden territory for some readers, why is it important to note?
It is of central importance to Bayesian statistics. This is because Bayesians like to work with and produce valid probability distributions. This is because only valid probability distributions can be used to describe uncertainty. The pursuit of this ideal underlies the majority of all methods in applied Bayesian statistics – analytic and computational
How would you calculate probability that the winning number, X , is 3 in the discrete probability distribution for the lottery? How would you calculate 10 or less?
Easy!
Pr(X = 3) = 1 / 100
To calculate the probability that the winning number is 10 or less, we just sum the probabilities of it being {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}: 1/10
How would you calculate the probability that the value of the second-hand car is $2500?
We could conclude that Pr(value = $2500) = 1/2000. However, using the same logic, we would deduce that the probabilities of the value of the car being {$2500, $2500.10, $2500.01, $2500.001} are all 1/2000. Furthermore, we could deduce the same probability for an infinite number of possible values, which if summed together would yield infinity. This means that, for a continuous random variable, we always have Pr(θ = number) = 0, to avoid an infinite sum.
What is the solution to this problem regarding infinite sums in continuous distributions?
When we consider p(θ) for a continuous random variable, it turns out we should interpret its values as probability densities, not probabilities. We can use a continuous probability distribution to calculate the probability that a random variable lies within an interval of possible values.
What is the equivalent of a sum in when calculating probability from a continuous distribution?
To do this, we use the continuous analogue of a sum, an integral. Calculating an integral is equivalent to calculating the area under a probability density curve. For the car example, we can calculate the probability that the car’s value lies between $2500 and $3000 by determining the rectangular area underneath the graph shown:
1 / 2000 (height) x 500 (base) = 1/4
What is the difference between p(…) and pr(…)?
we use Pr to explicitly state that the result is a probability, whereas p(value) is a probability density.
What is meant by the base in calculating the example previously?
In the example of crossing the ice you are certain to fall into from the book:
For densities we must supply a volume, which provides the exchange rate to convert it into a probability. Note that the word volume is used for its analogy with three-dimensional solids, where we calculate the mass of an object by multiplying the density by its volume. Analogously, here we calculate the probability mass of an infinitesimal volume:
probability mass = probability density x volume
However, here a volume need not correspond to an actual three- dimensional volume in space, but to a unit of measurement across a parameter range of interest. In the above examples we use a length then an area as our volume unit, but in other cases it might be a volume, a percentage or even a probability.
How can we hope to obtain a sample of numbers from our distribution, since they are all individually impossible?
When we say an event is impossible, it has a probability of zero. When we use the word impossible we mean that the event is not within our space of potential outcomes.
Imagine a sample of numbers from a standard normal distribution. Here the purely imaginary number i does not belong to the set of possible outcomes and hence has zero probability. Conversely, consider attempting to guess exactly the number that we sample from a standard normal distribution. Clearly, obtaining the number 3.142 here is possible – it does not lie outside of the range of the distribution – so it belongs to our potential outcomes. However, if we multiply our probability density by the volume corresponding to this single value, then we get zero because the volume element is of zero width. So we see that events that have a probability of zero can still be possible.
How do we use Bayes’ rule differently for probability distributions and probability distributions?
While it is important to understand that probabilities and probability densities are not the same types of entity, the good news for us is that Bayes’ rule is the same for each.
p(θ = 1| X = 1) just becomes pr(θ = 1| X = 1)
When the data, X , and the parameter θ are discrete, and hence Pr denotes a probability. When the data and parameter are continuous and p denotes a probability density.
What is the mean of a distribution?
A mean, or expected value, of a distribution is the long-run average value that would be obtained if we sampled from it an infinite number of times.