Probability And Statistics - Key Words Flashcards
What do we use probability and statistics to do?
1) Collect and organise data
2) Explore descriptive relationships (this will bring together two different variables
3) Investigate casual relationships (to see if there is a significance between an underlying cause and the produced results)
When is it appropriate to use statistics?
When 1) there is a large number of 2) similar processes or phenomena (I.e when something is repeated many times.
What is a random experiment?
A random experiment is a process that leads to the occurrence of one and only one of several distinct possible results, which can in principle, be replicated.
This is an experiment because it can be replicated.
What is a random experiment?
A random experiment is a process that leads to the occurrence of one and only one of several distinct possible results, which can in principle, be replicated.
This is an experiment because it can be replicated.
What is the outcome of an experiment?
This is one of the distinct possible results of an experiment.
When conducting a random experiment, we assume that we know all the possible outcomes of these random experiments, excluding exploratory experiments which lead to unexpected results.
What is a sample space.
A sample space is the collection of all the possible outcomes of an experiment. This is denoted by the Greek symbols Omega.
What is the complement of an event?
This is an event that does not occur in event X, and it will be denoted by a squiggly line in front of the X (~), or a bar on top of the X.
Remember that the complement of an event is also an event
What is the complement of the whole sample space?
This is called a null or empty set, but it is still an event.
What is an event?
An event is a collection of one or more outcomes, or the null set.
How can we combine events?
Then can either be combined in Union (means or), or they can be combined in intersection (meaning ‘and’).
These can both be known as event C.
This can extend to more than just two events, as events are associative, they can simply just be added in any order.
What are mutually exclusive events?
If the intersection is a null or empty set, then these events will mutually exclusive, also known as disjoint.
What are collectively exhaustive events?
This is if the union of two or more events is the sample space, in which case they are collectively exhaustive.
What is the definition of probability?
This is the assignment of number P(A) to A, which must abide three conditions.
What 3 conditions must probability abide to?
1) The probability must be greater than zero for any event A.
2) The probability of the whole sample space must be 1
3) The probability of the Union of all mutually exclusive events will equal the addition of all the individual mutually exclusive probabilities.
As a result, all probabilities will lie between 0 and 1 (inclusive).
What are the three approaches to probability?
1) Classical probability approach
2) Empirical/relative frequency probability approach
3) Subjective probability approach
We must calculate and understand the probability based on the context, different approaches will be necessary at different times.
What is the classical probability approach?
If a random experiment can result in n mutually exclusive and equally likely outcomes and n_a of these outcomes have attribute A, then the probability of A occurring is the fraction n_a/n.
However, the probability of different events occurring won’t always have likely outcomes as shown by Raphael Weldon when he threw 12 dice over 26,000 times.
What is the empirical/frequenting probability approach?
The probability of an event is the fraction of times that it has occurred in the past under the same experiment, if it has been repeated a large number of times.
However, not all experiments can be repeated (e.g the probability that the UK will have left the EU by 2020). This leads to a subjective probability approach.
What is a subjective probability approach?
This is when the probability of an event is assigned by an individual on the basis of his or her beliefs and information. An individual with different beliefs or information may assign a different probability.
There is no restriction of where these beliefs could come from.
Why does applied scientific research impact our experiments?
We try to use scientific research to predict uncertain outcomes and hence reduce the element of randomness. We do this by trying to understand how an event has occurred.
How do we deal with complexly experiments?
We can break them down to make them smaller experiments.
What does it mean to enumerate or list a complex experiment?
Let the outcomes be denoted by their ordered pair {D1, D2}.
Then treat the complex experiment as many smaller experiment in their orders.
What is the multiplication rule?
If outcomes of a random experiment can be represented by an ordered n-tuple, with the first component any of K1 outcomes etc, then the total number of possible outcomes will be K1 x K2 x …. x Kn.
This applied to both sampling with and without replacement.
What are permutations?
These are the outcomes when sampling r objects from a set of n different objects, and the order they’re in matters. There will be n!/(n-r)! different outcomes.
Remember 0! = 1.
What are combinations?
Combinations occur when sampling r objects from a set of n different objects, without replacement and where the order doesn’t matter.
There will be n!/[r! (n-r)!] different outcomes, which is equal to the number of permutations divided by r!.
What formula should be used to work out the number of outcomes when order matters, and there is no replacement?
n!/(n-r)! ,
where r is the objects (quantity of numbers in the lottery) and n is the number of possible numbers to choose from.
What formula should you use to see the number of possible outcomes when the order does matter and there’s replacement?
n^r, where n is the range of numbers you can select, and r is the quantity of numbers needed to be picked.
What formula should be used to see the number of outcomes when you have no replacement, and order doesn’t matter?
n! / [r! (n-r)!], these are called combinations.
What formula should be used when there are r sampling objects, n different objects, there is replacement and order doesn’t matter?
(n + r - 1)!/[r! (n-1)!]
What is conditional probability?
This is when you have to find the probability of an event A happening, given event B has already occurred.
Essentially, event B is the new sample space.
Key facts about a set of playing cards…
-52 cards
-13 denominations (2 to 10, ace, Jack, King, Queen)
-4 suits: Diamonds and hearts are red, Club and spades are black
What is the probability multiplication rule?
P(AnB) = P(B) x P(A|B) = P(A) x P(B|A).
Remember, the denominator cannot equal zero.
What is the probability addition rule?
P(AuB) = P(A) + P(B) - P(AnB)
What does it mean if two events are Independent?
P(A) x P(B) = P(AnB)
What is the equation for Bayes’ theorem?
If events A1, A2, A3 etc are mutually exclusive and collectively exhaustive, and Ai is any one of these events:
P(Ai | B) = P(Ai)P(B|Ai) / [P(A1)P(B|A1) + P(A2)P(B|A2) + … + P(An)P(B|An)]
What is a random variable?
A variable X will be a random variable (r.v) if it’s values can be thought to be determined by outcomes of a random experiment.
This means that X will depend on the outcome we get is.
What is a discrete random variable?
This is a random variable with values determined by a finite or a countably infinite number of events.
What is a continuous random variable?
A continuous random variable is a random variable with values determined by an infinite number of events which are not countable, for example drawing a random real number between 0 and 1.
What is probability mass function?
If X is a discrete random variable, and S is the set of values determined by outcomes of the random experiment, then for function f(x):
-f(x) >= 0 for all real numbers x
-the summing over all the values that have a positive probability of occurrence will equal 1.
There may also be values of X that aren’t determined by the outcomes of a random experiment, and these will have a value of zero. When this occurs, we think of X=0 as implying the experiment has not taken place.
How do probability mass functions (pmf) assign probabilities?
P(X=x) = f(x).
This means the probability the random variable X takes is equal to pmf evaluated at x.
What is a probability density function (pdf)?
For a continuous random variable where f(x) >= 0 for all real values of x, and the definitive integral from -infinity to infinity is 1, you get a probability density function.
How do cumulative distribution functions work (cdf)?
F(x) = P(X <= x)
As it is cumulative, the probability X is the always either equal to or less than what p(x) will be, as it will be at least as high as the current probability.
What is the expectation of random variables?
E(X) = sum of x(i) x f(xi)
I’n words it is the weighted average of its values, where weights are the corresponding values of the pmf, it is a measure of central tendency of probability around the random variable.
The probability of the random variable could be zero, but the expected value is just how the probability is distributed around this point.
What is the variance?
V(X) = sum of (xi- mu(xi))^2 x f(xi)
In words, the variance of a random variable is the weighted average of the squared deviations of its values from the mean, where weights are corresponding to values of the pmf (their probability.
The standard deviation is the positive square root of the variance.
What is the equation for the variance of a continuous random variable?
V(x) = the finite integral of (x - mu(x))^2) x f(x).
What is the median of a random variable?
The median of a random variable is the point that splits the distribution of X into two parts which overlap at the median, each part will have a probability of at least 0.5.
If P(X = m) = 0, then both parts of the distribution are exactly equal to 0.5. This will always be the case for a continuous random variable, and may (but not necessarily) apply to a discrete random variable.
-Unlike the mean, the median doesn’t necessarily have to be a unique value.
What is the difference between the mean and median?
The mean is the constant that minimises E[(X - c)^2], and the median is the constant that minimises E(|x - c|).
From this, we can see that the mean will be more affected by extremes, because it will square the prediction error, where as the median keeps a constant absolute value no matter how large the error is. This means when large anomalies are present, the median will be the more accurate measure.
How can we get the expected value of a function of a random variable?
If we let a random variable Y = g(X) be a function of X, where X has a pmf or pdf f(x), then Y is also a random variable. Therefore, E(Y):
-If discrete = sum of g(x) multiplied by f(x).
-If continuous = the integral of f(x) times g(x).
We have however seen this before, as g(x) is equal to (x - mu(x))^2.
Why would a grade received in an exam be a random variable?
This is because the marks are dependent on the random factors of each student that impact their ability to do well on the exam (e.g sleep quality night before, cognitive ability, time revised for etc).
How can we check if there is a casual relationship between two random variables?
To do this, we can use random variable techniques, so we can see what would’ve happened if something else didn’t happen.
To see if one variable is associated to another, we can use conditional probabilities, and see what the correlation between, say, grades in 2nd year maths would be, given you get a specific grade in 1st year maths.
What are the issues we still have when using condition probabilities to see casual relationships between two random variables?
1) We cannot hope to easily summarise the relationship between the two marks.
2) We cannot hope to derive features of their relationship between these two variables to compare it to other random variables.
What is joint relative frequency?
This if the joint probability of two random variables both occurring at the same time, and this can be displayed by a table.
What is the joint probability mass function?
If you have two discrete random variables X and Y, then there will be a function f(X, Y) such that:
1) f(X,Y) >= 0 for all pairs of real numbers (x, y).
2) the double summation of all the pairs of values X and Y will equal 1. To see how to do double summation, look at lecture slides 4.
The probability assignment P(X = x, Y = y) = f(X = x, Y = y)
What is the joint probability density function?
For continuous random variables X and Y, a function f(x,y) such that:
-f(x,y) is non-negative for all real numbers (x,y)
-The double integral is equal to 1. Bare in mind, as this is a 2 variable function, the function will make a 3D graph. The probability will be represented by the volume under the p.d.f surface.
What is joint cumulative distribution?
F(X = x, Y = y) = P(X <= x, Y <= y). We will add up all the probabilities which apply to this constraint, for both variable by looking at the joint frequency table.
What are marginal distributions of discrete random variables?
When we have joint distribution we can derive the probability mass functions of each of the random variables that are jointly distributed.
From the joint probability mass function f(X,Y) we can derive the pmf of X and Y denoted f1(X) and f2(Y), as they are an ordered pair, with X first and Y second.
How would we find f1(x) of a marginal distribution function?
1) We need to find f(x,y) for all of x such that f(x,y) is strictly positive
2) Sum all the strictly positive pairs for all the values of x which fit the criteria in step 1.
3) once all are summed, this should equal 1, it may not if the decimals have been rounded early.
How do we find marginal distributions of continuous random variables?
-To do this, all you have to do is integrate the variable which you aren’t taking in the marginal distribution of.
-Remember, for each part of the function from minus infinity to infinity where the function changes, a new integral is needed. The total join c.p.f
What is the joint probability mass function?
If X and Y are discrete random variables, then the function f(X, Y) such that:
-f(x,y) >= 0 for all pairs of real numbers (x, y).
-the double summation of the joint probabilities should all add to one. To see how to do this, refer to lecture notes.
What are conditional expectations/mean?
Conditional expectations are the expectations of certain random variables given a condition of the other variable in the joint function.
For example, for the random variables X and Y, the conditional mean/expectation of X conditional on y is:
E(X | Y = y) = mu(X|y).
What is conditional distribution?
We have seen how we can condition the probability of an event on the occurrence of another event, so the joint distribution functions allows us to condition distributions of one random variable on the values of another random variable, allowing us to calculate the probability of one random variable, conditional upon the value of another.
Mathematically, this looks exactly the same as normal conditional probability.
What is the conditional variance?
The variance of conditional distribution is called the conditional variance. For a random variable X and Y, the conditional variance of X given y is:
V(X|Y=y) = E[(X - mu(X|y)^2 | y] = (sigma)^2 X|y
What is the conditional expected function?
The conditional expected function of, say random variables X and Y, where we want the expectation of X given Y is any y, just do the same as a conditional pdf of X, but without specifying the value of Y just leaving it as y in the equation.
What are the expectations of jointly distributed random variables?
If we let Z = g(X, Y) be a function of random variables X and Y, where X and Y have a joint pmf of pdf, the expected value of Z is given given by:
E(Z) = sum of x(i)(sum of y(j))[g(x(i), y(j)) x f(x(i), y(j)) for discrete values, and:
E(Z) = intergral g(x, y) x f(x, y) dx dy
What is the covariance of a function?
g(X, Y) = (X - mu(X))(Y - mu(Y)). The expectation of this function made up of two different random variables is called the covariance, denoted Cov(X, Y).
The formula for how to calculate this will vary depending on in the random variables are discrete if continuous. Both can be found in the lecture slides/notes.
NOTE: The calculations of the covariance can be shortened using the formula Cov(X, Y) = E(XY) - mu(X)mu(Y)
What are important factors to note about the covariance?
If X takes its value above the mean, and Y also does, then covariance will be positive. But if when X takes a value above it’s mean, Y tends to take a value below its mean, the covariance will be negative.
Also, the value of the covariance will depend on its units. If X and Y were measured in GDP, but then changed to pence, covariance will be multiplied by 10,000, without any relation of X and Y changing. Therefore, the covariance value itself has little meaning, but it is meaningful to compare the signs of covariance of random variables.
What is correlation?
If we let X and Y be random variables with covariance Cov(X, Y) and variances V(X) and V(Y), then the correlation coefficient or correlation is given by:
Corr(X, Y) = Cov(X, Y)/(sqrt(V(X)) x sqrt(V(Y)))
The correlation will have the same sign that the covariance does. The correlation coefficient will take a value between -1 and 1 inclusive. -1 signifies a perfect negative correlations, and 1 a perfect positive correlation.
How can we find if two random variables are statistically independent?
This requires that for two random variables X and Y, the probability of any event defined in terms of X doesn’t change when know any value of Y has occurred.
If g1(x | y) = f1(x) , and g1(x | y) = f(x, y)/f2(y), then we can say that if:
X and Y are jointly distributed random variables with joint pmf or pdf f(X,Y)bans marginal pmf or pdf’s f1(X) and f2(Y), then X and Y are statically independent if and only if:
f(X, Y) = f1(X) x f2(Y). This must stand for ALL values of x and y.
What is a property of statistically independent random variables X and Y?
They will have a covariance of zero (can be proven mathematically), but the reverse of this will not necessarily be true.
also, if X and Y are independent, then U =f(X) and V = g(Y) are also independent.
What are the expectations of linear functions of random variables?
For two random variables X and Y, the expressions for expectations of linear functions are as shown, if we let Z = a + bX + cY:
E(Z) = a + bE(X) + cE(Y)
Var(Z) = (b^2)Var(X) + (c^2)Var(Y) + 2bcCov(X,Y)
What rule should we use to derive the covariance of two linear functions of two random variables?
If we have Z1 = a1 + b1X + c1Y and Z2 = a2 + b2X + c2Y:
Cov(Z1,Z2) = b1xb2xVar(X) + c1xc2xVar(Y) + (b1xc2+b2xc1)Cov(XY)
What is mean independence?
If the conditional expectations of Y doesn’t vary as X varies, then we can say that Y is mean independent of X, denoted as E(Y|X) = mu(y).
This means the CEF of Y conditional on X will be constant.
What is important to note about mean independence?
Mean independence is weaker than independence, and it doesn’t imply independence, but independence does imply mean independence.
-If Y is mean independent of X, the covariance between X and Y is zero, but the reverse isn’t necessarily true.
-Just because Y is mean independent of X, X won’t always be mean independent of Y.
What can we do with functions containing 3 random variables [f(x,y,z)]
The same as we would do with two r.v functions, we could obtain the CEF of Y conditional on X, Y conditional on Z, or Y conditional on X and Z.
If we wanted E(Y|X), we would first have to get the joint marginal probability of X and Y, to Mel image the 3rd random variable Z.
To do E(Y|XZ), we would first need the marginal probability function of XZ, so we can then find Y, conditional on the combinations of X and Z.
Finding these can allow us to do more in depth analysis and calculations about data.
Why do we prefer CEF of 3 random variable (trivariate) functions to be linear?
It means that the CEF will be far easier to interpret, but it is important to note we mean linear in their parameters, not their variables.
E(Y|X,Z) = b0 + b1X + b2Z+ b3XZ, for the example in the slides, see lecture notes for a more thorough explanation.
When the conditioning variables are binary (can only take 0 or 1), the CEF can always be expressed as an equation linear to the parameters.
How do we find the joint probability mass function of n random variables?
If we have n random variables, then we just have to find the summations of all the different random variable combinations such that the outcome of the function is > 0 (as we can’t have a negative probability).
From this, we can then do probability assignment, the same way in which we would’ve done before.
If continuous, you need to do the integral for as many random variables as you have to get the probability density functions.
Marginal probabilities can also be extended in a similar way, but note not all binary relationships will be transitive, just because X is independent of Y, and Y is independent of Z, this doesn’t mean X will be independent of Z.
How can you express multivariate functions if they contain independent pairs?
If you have f(x,y,z) and x and z are independent, but x and y aren’t the joint pmf can be written:
f(x,y,z) = f3(z) x f12(x,y)
It also applies that the expectation of their product will be product of their expectations if independent.
What is a parametric family?
A parametric family is when probability functions have the exact same mathematical formulations except for one or more constants are grouped together in parametric families.
The same formulation apart from one constant is a one-parameter family of distribution, and so on.
What is the normal distribution?
The normal distribution of a random variable is a probability functions with two parameters, the standard deviation and the mean (expected value).
When the formula is applied to this random variable, it is known a normal random variable.
What are some key properties about normal distributions?
-They’re symmetrical about the mean.
-Linear functions of normal random variables are also normal random variables. If X is a normal r.v and Y = a + bX, then Y~N(a + b(mu), b^2(s.d)^2)
What is the standard normal distribution?
If we have X as a normal random variable, and Y = a + bX:
If a = -mu/s.d and b = 1/s.d the normal distribution of Y will be the standard normal r.v, Y~N(0,1).
What is it important to remember with the standard normal r.v probability?
It will give the cdf of that random variable, so if you want to find the probability if it being between the value n and the mean, you need to minus 0.5, as the negatives side of the mean will also be included in the cumulative probability.
What is statistical inference, and how does it work?
Statistical inference allows us to use a sample from an unknown distribution (also known as a population), to make inferences about unknown distributions.
We do this by first learning how to deal with samples from known populations, and then using this knowledge to see how we can extract information from samples in order to infer feature of an underlying unknown population.
What is a random sample?
A random sample of random variables X are independent drawings from the distribution (or population).
The ordered set (from X1 to Xn) is called the random sample, and has a size on n on the random variable X.
Each value of Xi is a random variable as their values are determined by an experiment. Xi is independently and identically distributed from other Xi’s.
What is the sample variance?
This is the total summation of all the different random samples from the sample mean squares, divided by (n - 1).
What can we do with the knowledge in a random sample the Xi’s are independently and identically distributed?
Gn(X) = the product individual marginal probability of each random variable in the sample, as they are independent and the laws of independent random variables apply.
What are sample statistics?
Let T = h(X1, X2, X3, X4…… Xn) = h(x). So, the values of T are decided by a function of h() of the random sample. This makes T a sample statistic.
What is the sample mean?
This is the arithmetic average of the random sample.
The sample variance then follows this, also using the sample mean to get the sample variance.
What is a sampling distribution?
The distribution of a sample statistic is called a sampling distribution, it depends on 3 things in general:
1) The function that determines the values of the sample statistic
2) The distribution from which we choose the sample (the pmf or pdf)
3) The size n of the random sample.
What is the sample mean theorem?
Given a random sample size of n, from a population with E(X) = mu and V(X) = population variance:
-Expected value of the sample mean will equal the population mean
-Variance of the sample mean will equal population mean/n.
How can we sample from the normal distribution?
When sampling the r.v X, with sample size n, X ~ N(mu, variance):
Sample Mean ~ N(mu, variance/n).
This follows directly from the sample mean theorem.
What is the Central Limit Theorem?
In random sampling size of n on any r.v with E(X) = mu and V(X) = variance, as the size of the random sample increases, the sampling distribution of the sample mean approaches a normal distribution with mean mu and variance: variance/n.
The standardised sample mean Z = (sample mean - mu)/(population s.d/sqrt(n)) approaches the standard normal distribution N(0,1).