Probability and Statistics Basics Flashcards

1
Q

Prob: What are the two equivalent definitions of events A and B being independent?

A

P(A,B) = P(A)P(B)

OR

P(A) = P(A | B=b) for all values of b

(Pretty darn sure second is correct)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Prob: What are the two equivalent definitions of random variables Y1 and Y2 independent?

A

F(y1,y2) = F1(y1)F2(y2) (The joint dist factors to the marginal dists)

OR

F1(y1) = F(y1 | Y2 = y2) for all values of y2 (The marginal distribution for either variable is the same as the conditional distribution given any value of the other variable)

(Pretty darn sure second is correct)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Prob: Conceptually, what does it mean for A and B to be independent, either as variables or as events?

A

A and B are independent variables if the value of one variable gives you no information about the value of the other.

A and B are independent events if knowing whether one event happened or not gives you no information on whether the other happened.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Prob: What is Bayes’ Theorem?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Prob: What is a formula for P(A union B)?

A

P(A) + P(B) - P(A and B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Prob: What is linearity of expectation?

A

E[cX + kY] = cE[X] + kE[Y], even if X and Y are dependent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Prob: What is one potentially convenient way to find P(A and B) when A and B are dependant?

A

P(A)*P(B|A), or P(B)*P(A|B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Stat: What proportion of points drawn from a normal distribution will fall within 1 standard deviation? 2? 3?

A

68% within 1, 95% within 2, 99.7% within 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Prob: What is the law of total probability?

A

If you can decompose the sample space S into n parts B1,…,Bn, then

P(A) = P(A|B1)P(B1) + … + P(A|Bn)P(Bn)

A common form is

P(A) = P(A|B)P(B) + P(A|Bc)P(Bc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Prob: What trick is often used in the denominator of a Bayes’ Rule problem?

A

Law of total probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Prob: What is a probability density function, or pdf f(), typically used for?

A

For a given probability distribution, you can integrate f() over an interval (or area, or n-d area) to find the probability that an experiment will fall in that interval/area.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Prob: what is a cumulative density function F(), or cdf, typically used for? How is it related to the pdf f()?

A

For a given probability distribution of RV X, F(x) = P(X <= x)

If you integrate f() from -inf to a, you get F(a)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Prob: What is the formula for the expected value of discrete RV X?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Prob: What is the formula for E[g(X)], or the expected value of a function g of continuous RV X, with pdf f()?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Prob: V[aX+b]?

A

a2V[X]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Prob: Conceptually, what does it mean for a probability distribution Y to be memoryless?

A

For an experiment, past behavior has no bearing on future behavior. For example, if you’re waiting for a bus to come and it follows a memoryless distribution (such as an exponential one), if you wait 5 minutes and there’s still no bus, the probability distribution of when it will arrive starting now, after 5 minutes is the same as it was when the experiment began.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Prob: what is the expected value of a geometric random variable (i.e. flip coin until a success) with probability p of success?

A

1/p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Prob: If our binonial distribution (flip n times, see how many are succcesses) has events with probability p of success, and we conduct n events, what is the probability that y will be successes (assuming 0 <= y <= n)? And what is the intuition behind this result?

A

py(1-p)n-y is the odds of a specific result with y successes (so y specific positions being successes, and the other n-y being failures). But we need the probability of any; these occurrences are disjoint, so we sum their probabilities by multiplying by the number of such potential outcomes, which is n choose y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Prob: In words, what is the law of large numbers?

A

When sampling from a distribution, as the number of samples grows, the sampling mean will tend towards the expected value of the distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Normal: If Y follows N(µ,ð2), what is the formula for the z-score Z of Y=y?

A

Z = (y - µ)/ð

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Normal: If Y follows N(µ,ð2), what (in words) is the z-score of Y=y?

A

The number of standard deviations ð that y is above or below the mean µ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Stat: What does the standard normal distribution Z follow?

A

Z follows N(0,1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Prob: What is the law of total expectation?

A

We can find E[X] by taking the weighted sum of the conditional expectations of X given all values of a variable Y.

For example, if Y = Y1 or Y2, then

E[X] = E[X|Y1]P(Y1) + E[X|Y2]P(Y2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Prob: What is the formula for the conditional expectation of X given that Y=y?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Stat: What is our estimator Ø_h (theta-hat) a function of?

A

The data X1, X2, X3…! (This is important!)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Stat: Given what our estimator Ø_h is a function of, what 2 important properties does it have?

A

It is a random variable

Which means it has its own probability distribution, with E[Ø_h], V[Ø_h], etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Stat: What, in these flashcards, is my notation for Theta and Thet-hat?

A

Ø and Ø_h respectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Stat: At a high level (so like coloquially explaining the formula), how do you find the sample variance s2 given data X1,…,Xn?

A
30
Q

Stat: What does it mean for an estimator Ø_h to be accurate?

A

It has a mean close to the true value Ø; in other words, its bias is low.

31
Q

Stat: What does it mean for an estimator Ø_h to be precise?

A

It tends to produce similar answers each time; in other words, its variance is low.

32
Q

Stat: What is the formula for MSE(Ø_h), or Mean Squared Error?

A

MSE(Ø_h) = E[(Ø_h - Ø)2 ]

= V(Ø_h) + bias(Ø_h)2

but only really need that first part

33
Q

Stat: What is the formula for the bias of Ø_h? What does it mean for Ø_h to be unbiased?

A

Bias(Ø_h) = E[Ø_h - Ø]

Ø_h is unbiased iff Bias(Ø_h) = 0, or if the expected value of Ø_h is the correct value Ø.

34
Q

Stat: What is the standard error of Ø_h? And what in general does this quantity represent?

A

SE(Ø_h) = sqrt(V[Ø_h])

It is an idea of the typical error of the estimator, or the typical distance it will be from its mean.

35
Q
A
36
Q

Stat: What is probably the most common measure of the quality of estimator Ø_h?

A

Mean Squared Error, or MSE(Ø_h)

37
Q
A

It describes how likely those a distribution with those parameters was to make that dataset.

(I think it is often talked about in the context of a specific family of distributions. So we might say, what is the likelihood of a normal distribution with these paramaters, given this dataset?)

38
Q
A
39
Q

Stat: What does i.i.d. stand for?

A

Independent and Identically Distributed

40
Q

Stat: What is a MVUE?

A

It’s a Minimim-Variance Unbiased Estimator. So for some parameter Ø, it’s the unbiased estimator Ø_h with the lowest variance out of all the unbiased estimators.

41
Q

Stat: When we find a Maximum Likelihood Estimator, Min-Var Unbiased Estimator, Method of Moments Estimator, or something similar, do we typically find it in the context of some assumed distribution family (i.e. assume the distribution is normal, exponential, etc), or estimate parameters without a suspected distribution?

A

While sometimes we estimate parameters without a suspected distribution, such as distribution mean and variance, we generally more often use an assumed distribution family.

(This is mostly my opinion, and also me wanting to remember that when we for example “find the MLE”, it generally has quite a bit of structure due to an assumed distribution that we can differentiate/optimize.)

42
Q

Stat: What is a Maximum Likelihood Estimator?

A

It is the estimator Ø_hat of Ø that maximizes the likelihood of your data.

So, generally for some assumed distribution family such as Exponential Distributions, you try to find an estimator lambda_hat for parameter lambda that leads to the exponential distribution that was most likely to produce this data.

43
Q

Stat: At a high level, how do you find the MLE estimate for the parameters?

A

Differentiate the likelihood w.r.t. the parameters and set that equal to zero, then solve

44
Q

Stat: Given observations X1,…,Xn, what is the maximum likelihood estimator for a population proportion: for example, the proportion of red balls if we’re drawing from red, green or blue?

A

reds/n

45
Q

Stat: Define a 95% confidence interval [L,U] for parameter Ø.

A

For L and U, which are random variables based on your observations Xi, P(L <= Ø <= U) = 95%.

Meaning, when you sample your Xi’s and calcuulate L and U, the odds that then end up so L <= Ø <= U is 95%.

46
Q

Stat: What is the correct way to interpret 95% confidence interval [L,U] for parameter Ø?

What is a common incorrect way of interpreting it, and why is this incorrect?

A

Correct: “I am 95% confident that my calculated confidence interval [L,U] contains Ø.”

Incorrect: “There is a 95% chance that Ø is in the interval [L,U].”

The latter is incorrect because the true population parameter Ø is not a random variable. It is a set value that just exists in the world, and it either is in the interval or it isn’t; there is no chance involved.

47
Q

Stat: What is the way of interpreting a 95% confidence interval that involves considering if you computed many 95% confidence intervals?

A

If I compute a high number of 95% confidence intervals, over time, about 95% of them will contain their respective parameters.

48
Q

Stat: Given observations Xi and an unknown parameter Ø, what is a pivot?

A

A pivot is and expression that is :

  • A function of the observable R.V.’s (i.e. the observations Xi)
  • And of the unknown parameter Ø,
  • But no other unknowns.
  • And who’s distribution does not depend on the unknown Ø.

This is an important one!

49
Q

Stat: What is the Central Limit Theorem?

A
50
Q

Stat: Why is the Central Limit Theorem so important and useful?

A

Given enough sample size, we can find an approximate distribution for the sample mean of the Xi’s, but we don’t need to know anything about the underlying distribution of Xi! It doesn’t need to be of a specific family, and it can be an insane looking distribution, but we can still find an approximate distribution of the sample mean.

Using this, we can also find a confidence interval for the sample mean, which is great.

51
Q

Stat: This varies from table to table, as some have different definitions. But in your stats class, what was the definition of za?

A

If Z is the standard normal N(0,1), za is such that

P(Z > za) = a

Graphically, or verbally: the probability a draw from Z appearing above za is a.

52
Q

Stat: Given our definition of za (and with similarly defined quantities like ta,n-1 and chai-squareda,n-1), what is the probability expression used in almost all confidence intervals we make?

A

The following, which can be similarly written for t dist, chai-squared dist, etc. But it’s especially common to use the normal version, due to the CLT and all the great info we have about normals.

53
Q

Stat: In hypothesis testing, what is a null hypothesis?

A

Null Hypothesis Ho is the “status quo” or “safe hypothesis”. It is the baseline, and we are looking for significant evidence that it is not true. For example, when testing whether two groups have different performance on a task, the null hypothesis is that their performance is the same.

54
Q

Stat: In hypothesis testing, what is an alternative hypothesis?

A

The alternative hypothesis an idea that breaks from the “status quo” or “baseline assumption”, for which we are looking to see if there is significant evidence. For example, when testing whether two groups have different performance on a task, the alternative hypothesis could be that group A performs better than group B, for example.

55
Q

Stat: In hypothesis testing, what is a test statistic?

A

The test statistic in a hypothesis test is a function of your observable data which you will use to quantitatively examine your null and alternative hypotheses. For example, when testing whether two groups have different performance on a task, the test statistic might be the difference in mean performances of the 2 groups.

56
Q

Stat: What are the 2 possible conclusions an experimenter can make from a hypothesis test?

A

“Reject the null hypothesis in favor of the alternative,” and “Fail to reject the null hypothesis.”

57
Q

Stat: In hypothesis testing, what is the rejection region?

A

It is the predecided range of (extreme) values of the test statistic in which we will “reject the null hypothesis in favor of the alternative.”

58
Q

Stat: In hypothesis testing, what is a Type 1 error?

A

It is when we reject the null hypothesis Ho even though it is true.

59
Q

Stat: In hypothesis testing, what is a Type 2 Error?

A

It is when we fail to reject the null H0, but the alternative H1 is true,

60
Q
A
61
Q

Stat: In hypothesis testing, what does a “level 0.05 test” mean?

A

Alpha = 0.05

62
Q

Stat: In hypothesis testing, what would a low value of alpha such as 0.001 mean? What about a high value like 0.20?

A

A low value of alpha like 0.001 means that we require very compelling evidence (or very extreme values of our test statistic) in order to reject the null hypothesis.

Conversely, a high value like 0.20 means that we have very relaxed and un-stringent requirements for rejecting our null hypothesis.

63
Q

Stat: In hypothesis testing, what is a p-value?

A

Once you conduct your experiment and calculate the test statistic, the p-value is the probability of getting results that are as extreme or more extreme than your test statistic, under the assumption that the null hypothesis is true.

64
Q

Stat: In hypothesis testing, what value do we use to determine whether or not we reject the null.

A

The p-value. (We need to calculate the p-value from the test statistic under the assumption of the null, in order to see how unlikely our result is under the null.)

65
Q

Stat: In hypothesis testing, what do we conclude if the p-value is larger than alpha?

A

If say, p-val = 0.10 and alpha = 0.05, then our results are not as extreme as our alpha requires, and so we fail to reject the null hypothesis.

66
Q

Stat: In hypothesis testing, what do we conclude if the p-value is smaller than alpha?

A

If say, p-val = 0.01 and alpha = 0.05, then our results are more extreme than our alpha requires, and so we reject the null hypothesis in favor of the alternative hypothesis.

67
Q

Stat: In hypothesis testing, what is a one-sided hypothesis?

A

We reject only if the test statistic is extreme in one of the two directions. For example, if the null is µ = 0, the alternative is µ > 0.

68
Q

Stat: In hypothesis testing, what is a two-sided hypothesis?

A

We reject if the test statistic is extreme in either directions. For example, if the null is µ = 0, the alternative is µ =/= 0, and we reject if the test statistic is extremely high or extremely low.

69
Q

Stat: What is the key feature of classical, or frequentist, statistics? And what are some types of analytical tools used in this statistical philosophy?

A

In classical/frequentist statistics, the parameter Ø is constant. We examine it using estimators Ø_h, we quantify our uncertainty of its value using confidence intervals, and we test theories using hypothesis tests and p-values.

70
Q

Stat: What is the key feature of bayesian statistics? And what are some types of analytical tools used in this statistical philosophy?

A

The parameter Ø is viewed as variable, and we quantify our opinions around its potential values using a prob dist π.

71
Q

Stats: In Bayesian statistics, how do we update π, our prior distribution of Ø, using data Xi?

A

You incorporate using a method looking very similar to Bayes’ law.

Specifically:

72
Q

Stats: In bayesian statistics, what happens to the prior distribution as we get more and more data?

A

With enough data, the impact of the prior distribution on the posterior distribution tends towards 0.