Intro Probability & Stats Flashcards

1
Q

What is the sample space?

A

The set of possible outcomes Ω

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How are elements in Ω denoted?

A

w

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an event?

A

A subset of the sample space A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an elementary event?

A

Events in the sample space that cannot be divided any further. (They are just outcomes in the sample space)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a null event? How is it denoted?

A

The null event is the set containing no outcomes. This event cannot happen. It is denoted ∅

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a certain event? How is it denoted?

A

A certain event is the event containing all outcomes (the whole sample space). It is denoted Ω.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a random variable?

A

A numerical summary of a random outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between a discrete random variable and a continuous random variable?

A

A DRV takes on a countable number of possible values. A continuous random variable takes on a continuum of possible values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a probability space?How is a probability space denoted?

A

A set Ω of a σ-field of subsets A, and a probability measure P defined on A. Denoted: (Ω, A, P) (where A is magical curly A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is A (magical, curly A)?

A

A collection of events to which we assign probabilities. Formally, A(curly) is a non-empty collection of subsets of Ω such that (1) if A is in A(curly), then A^C (complement of A) is in A(curly) and (2) If A and B are in A, so are A∪B and A∩B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is P in a probability space?

A

A probability measure P on a σ-field of subsets A(curly) of a set Ω is a real-valued function having domain A(curly) satisfying: P(Ω) = 1, P(A) ≥ 0 for all A in A(curly), If An for n = 1,2,3… are mutually disjoint sets in A, then the probability of the union of all of these disjoint sets will equal the sum of the probability of each disjoint set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a discrete random variable?

A

A discrete real-valued random variable X on a probability space (Ω, A, P) is a function with domain Ω and a range of a finite (or countably infinite) subset {x1, x2,…} of the real numbers R such that {ω∈Ω : X(ω)=xi} is an event for all i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a probability mass function?

Which random variables have probability mass functions?

A

The probability that we get a particular outcome x. f(x) = P(X = x)

Discrete Random Variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a cumulative distribution function?

A

The probability that the random variable is less than or equal to a particular value. F(x)=P(X ≤ x):

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are three properties of a cumulative distribution function?

A
  1. 0 ≤ F(x) ≤ 1 for all x

  2. F(x) is non-decreasing in x
  3. • limx→−∞{F(x)} = 0 and limx→∞{F(x)} = 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a continuous random variable?

A

A continuous random variable X on a probability space (X, A, P) is a real-valued function X(ω), ω ∈ Ω, such that for
−∞ < x < ∞, {ω|X(ω) ≤ x} is an event.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does the area under the probability density function between any two points represent?

A

The probability that the (continuous) random variable falls between those two points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the relationship between a probability density function and a cumulative distribution function?

A

If we integrate PDF, we get CDF. If we differentiate CDF, we get PDF.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the probability density function for a uniform distribution?

A

fX(x) = 1/b-a for a < x < b; and 0 elsewhere

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the cumulative distribution function for a uniform distribution?

A

0 for −∞ < x < a
(x-a)/(b-a) a ≤ x < b
1 b ≤ x < ∞

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Name three discrete random variable distributions

A

Bernoulli
Binomial
Poisson

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Name five continuous random variable distributions

A
Uniform 
Normal 
Chi-Squared
F distribution 
Student's t distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What possible outcomes are there in a Bernoulli distribution?

A

The random variable is binary, so the outcomes are 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the probability mass function for a Bernoulli distribution?

A

f(x) = p^(x)(1-p)^(1-x) for x∈{0,1} and 0 elsewhere

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the notation for a Bernoulli distribution? What is the notation for a binomial distribution?

A

Bernoulli X ∼ B(p)

Binomial Y ∼ B(n, p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the Binomial distribution? (in relation to Bernoulli)

A

The Binomial Distribution is the total number of successes from n repetitions of the same Bernoulli experiment. Binomial random variable takes values {0,1,2,…n}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the probability mass function for a binomial distribution?

A

f(x) = (n x) p^(x).(1-p)^(n-x) for x∈{0,1,2,…,n} and 0 elsewhere

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How do you calculate the binomial coefficient? (n x)

A

n!/x!(n-x)!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is required for a normal approximation to binomial distribution?

A

n being high

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Name four requirements for a binomial distribution

A
  1. There are a fixed number of trials, n, that is determined in advance and is not a random variable
  2. There are two possible outcomes for each trial: success and failure
  3. The outcomes are independent from one trial to the next
  4. The probability of success and the probability of failure remains the same across all n.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

If X has finite expectation, how do we define E[X] for a DRV?
(give a verbal explanation and a formula)

A

Verbal: The expected value of a discrete random variable is the sum of all possible values the random variable X can take weighted by their probabilities.
Check formula in notes as can’t write on here.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is E[X] of a Bernoulli random variable?

A
E[X] = 1*P(X=1) + 0*P(X=0) 
E[X] = P(X=1) = p
33
Q

How do you find the expected value of a CRV?

A

By integrating x multiplied by the probability density function

34
Q

Give three properties of the expected value (not Jensen’s)

A
  1. If c is a constant and P(X = c) = 1, then E[X] = c
  2. If c is a constant and P(X=c) = 1, then E[g(X)] = g(c)
  3. If b and c are constants then E[b + cX] = b + cE[x]
35
Q

Prove that expectation operator is linear. That is, prove that E(aX + bY) = aE[x] + bE[Y].

A

E(aX + bY) = ∑(axP(X=x)+byP(Y=y))
E(aX + bY) = ∑(axP(X=x))+∑(byP(Y=y))
E(aX + bY) = a∑(xP(X=x))+b∑(yP(Y=y))
E(aX + bY) = aE[x] + bE[Y]

36
Q

What does Jensen’s inequality say about E[X]?

A
  1. If X is a random variable and g(X) is a convex function, then E[g(x)] ≥ g(E[X])
  2. If X is a random variable and g(X) is a concave function, then g(E[X]) ≥ E[g(x)]
37
Q

What is variance? (verbal explanation and formula)

A

Variance is the expectation of the squared difference between the random variable and its expected value.
Var(X) = E[(X-E[X])^2]

38
Q

What is the relationship between the variance of a random variable and the standard deviation of a random variable?

A

Variance is equal to standard deviation squared σ^2=Var(X)

39
Q

How do you find the variance of a discrete random variable?

A

For the variance of discrete random variables, for each possible value of the random variable, we find the variance and then we sum all the variances of the possible values, weighted by their probabilities.
(check formula in notes)

40
Q

What is the variance of a Bernoulli random variable?

A

((1-p)^2)p + ((0-p)^2)(1-p)

Var(X) = p(1-p)

41
Q

What are two ways of writing the expression fro variance?

A
  1. Var(X) = E[(X-E[X])^2]

2. Var(X) = E[X^2] – (E[X])^2

42
Q

How can we re-write Var(cX)?

What about Var(b+cX)?

A

Both c^2Var(X)

43
Q

What is the variance of a constant?

A

0

44
Q

What does Jensen’s inequality say about Var(X)?

A

By Jensen’s inequality, for a non-constant X, Var(X) > 0 since (X-E[X])^2 is a strictly convex function

  • By Jensen’s inequality, if g(x) is a strictly convex function, then E[g(X)] > g(E[X])
  • Let the function g(x) be (X-E[X])^2
  • E[(X-E[X])^2] > g(E[X])
  • Var(X) > g(E[X])
  • Var(X) > (E[X]-E[E[X]])^2
  • Var(X) > (E[X] – E[X])^2
  • Var(X) > 0
45
Q

What is the formula for skewness of a distribution?

A

E[(X-E[X])^3] / Var(X)^(3/2)

46
Q

What skewness does a symmetric distribution have?

A

0

47
Q

If a distribution has a long right tail, what skewness does it have?

A

Positive

48
Q

What is the kurtosis of a distribution? (verbal explanation and formula)

A

The kurtosis of a distribution is a measure of how much mass is in its tails, and therefore is a measure of how much of the variance of X comes from extreme values.
Kurt(X) = E[(X-E[X])^3] / Var(X)^2

49
Q

What does asymptotic mean?

A

Refers to situations where a large sample has been collected. Large sample = +30 observations

50
Q

What is the joint probability mass function of two discrete random variables?

A

It is the probability that the random variables, X and Y, simultaneously take on certain values, x and y.
Check notes for formula way of writing.

51
Q

What is the marginal distribution of a random variable X?

How can it be calculated?

A

The marginal distribution of a random variable, X, is just another name for its probability distribution.
The term is used to distinguish the X alone from the joint distribution of Y and another random variable

The marginal distribution of X can be computed from the joint distribution of X and Y by adding up the probabilities of all possible outcomes for which X takes on a specific value
So for marginal distribution of X sum all the possible realisations of Y where x = xi

52
Q

How do you calculate marginal densities of X and Y?

A

For marginal distribution of x, you just integrate joint probability density function with respect to y
For marginal distribution of y, you just integrate joint probability density function with respect to x

53
Q

What is conditional distribution of Y given X?

A

The distribution of a random variable Y conditional on another random variable X taking on a specific value.

54
Q

What is Bayes’ Rule?

A

P(A|B) = P(A∩B)/P(B) = P(B|A)P(A)/P(B)

55
Q

What does it mean for two random variables to be independent? (verbal explanation)

What are 2 principles for independence?

A
  • Two random variables are independent if knowing the value of one of the variables provides no information about the other
    1. X and Y are independent if the conditional distribution of Y given X equals the marginal distribution of Y
      P(Y|X) = P(Y), P(X|Y) = P(X)
  1. If X and Y are independent, then their joint distribution is the product of their marginal distributions
    P(X).P(Y) = P(X∩Y)
56
Q

What is covariance? (verbal explanation + formula (2 expressions for formula))

A

Verbal: The covariance of two random variables is the expectation of the product of the two variables’ deviation from the mean. It is a measure of the extent to which random variables move together.

Formula:
Cov(Y,X) = E[(Y-E[Y])(X-E[X])]
Cov(Y,X) = E[YX] – E[Y]E[X]

57
Q

Why might we use the correlation coefficient to compare across distributions instead of the covariance?

A

The covariance is the product of X and Y, deviated from their means, so its units are the units of X multiple by Y. But, the correlation between X and Y is the covariance between X and Y divided by their standard deviations. Dividing by standard deviations means the correlation coefficient is unitless

58
Q

What is the correlation coefficient?

A

Corr(Y,X) = Cov(Y,X) / squareroot(Var(Y)Var(X))

59
Q

How else can we write Var(aY+bX)?

A

a^2Var(Y) + b^2Var(X) +2abCov(Y,X)

60
Q

If Y and X are independent, what is Cov(Y,X)? What about Correlation(Y,X)?

A

Both 0

61
Q

What is the overall difference between probability and statistics?

A

Probability: we know the population, and we are asking what is in the sample.

Statistics: we don’t know the distribution. We need to make inferences about the distribution from a sample

62
Q

What is simple random sampling?

What assumptions follow from simple random sampling?

A

Simple random sampling -> n objects are selected at random from a population and each member of the population is equally likely to be included in the sample

Assumptions:
1. Identically distributed. Because Yi…Yn are randomly drawn from the same population, the marginal distribution of Yi is the same for each i…n.

  1. Independently distributed.
63
Q

What is the formula for sample mean, ¯X? What is the sample mean?

A

1/n ∑{n, i=1}[Xi]

It is a random variable. Members of the sample are random variables so the sample mean is a random variable.

64
Q

What do analogue estimators do?

A

Estimate the population parameter using the corresponding feature of the sample. (e.g. estimate population mean using the sample mean)

65
Q

What is the expected value of the sample mean?

A

Population mean μx

66
Q

What is the variance of the sample mean?

A

(σx^2)/n

67
Q

What is the standard deviation of the sample mean?

A

σx/√n

68
Q

As n gets large, what happens to the expected value of the sample mean?

What happens to the variance of the sample mean?

A

expected value stays at population mean.

variance goes to 0

69
Q

What is simple verbal explanation of the law of large numbers?

What is the formal statement of the law of large numbers?

A

When the sample size is large, the sample mean will be close to the population mean with high probability.

The sample mean converges in probability to the population mean if the limit, as n tends to infinity, of the probability that the sample mean differs from the population mean in absolute magnitude by an amount greater than or equal to some positive value ε is equal to 0. See notes for formal expression (couldn’t write in here)

70
Q

How can we show that, as n gets large, the sample mean converged in probability to the population mean?

A

(most of the time) we can demonstrate mean square convergence.

This is because mean square convergence is more stringent than convergence in probability.

71
Q

What is mean square convergence?

How do we demonstrate mean square convergence?

Does the sample mean satisfy mean square convergence as n gets large?

A

lim(n→∞)⁡E[(¯Xn-μx)^2]=0

2 conditions for mean square convergence:
1. the limit, as n tends to infinity, of the expectation of the sample mean is the population mean

  1. the limits n tends to infinity, of the variance of the sample mean must be 0

Yes. The sample mean is an unbiased estimator so satisfied 1. Variance of sample mean is (σx^2)/n so as n tends to infinity, this will tend to 0

72
Q

What is Chebyshev’s Inequality?

A

check notes

73
Q

What explains why convergence in mean square implies convergence in probability?

A

Markov’s Inequality: P(X≥c) ≤ E[X] / c

If sub in ¯Xn-μx as random variable and use satisfaction of mean square convergence, will get to weak law of large numbers.

74
Q

What is the difference between strong and weak law of large numbers?

What are the conditions for strong law of large numbers?

A
  • Strong -> convergence almost surely
  • Weak -> convergence in probability

(1) X1, X2…Xn needs to be i.i.d
(2) E(|X|) < ∞ (require expectation of random variable to exist, it cannot be positive or negative infinity)

75
Q

What do we require from a consistent estimator?

A

Convergence in probability (between estimator and what is being estimated)

76
Q

What is an unbiased estimator?

A

An estimator θ ̂ is biased if E[θ ̂] ≠ θ

An unbiased estimator is one that is not biased. So an unbiased estimator satisfies E[θ ̂] = θ

77
Q

What makes one estimator more efficient than another?

Given what assumptions?

A

θ ̂1 is relatively efficient to θ ̂2 if Var(θ ̂1) < Var(θ ̂2)

Given that this is true for all values of θ ̂ and that both θ ̂1 and θ ̂2 are unbiased (we only compare variance given estimators are unbiased)

78
Q

What is BLUE?

A

Best Linear Unbiased Estimator -> ¯Xn is the most efficient estimator among all estimators that ae unbiased and are linear functions of Y1…Yn

see notes

79
Q

What are the assumptions required for the central limit theorem?

A
  1. X1,…Xn are i.i.d
  2. E[Xi] = μx where μx < ∞
  3. Var(Xi) = σx^2 where 0 < σx^2 < ∞
    (2 and 3 say that both mean and variance are finite (they exist))