Probability and Statistics Basics Flashcards

Question

Prob: V[aX+b]?

Answer 1

P(Y \> a+b|Y \> b) = P(Y \> a)

Answer 2

For an experiment, past behavior has no bearing on future behavior. For example, if you're waiting for a bus to come and it follows a memoryless distribution (such as an exponential one), if you wait 5 minutes and there's still no bus, the probability distribution of when it will arrive *starting now, after 5 minutes* is the same as it was when the experiment began.

Answer 3

We have an event such as a coin toss with probability p of succeeding, and we keep performing attempts until we succeed.

Answer 4

(1-p)^y-1p

Answer 5

We have an event, such as flipping a coin, with probability p of success, and we look to see how many of our n trials will be successes.

Answer 6

p^y(1-p)^n-y is the odds of a specific result with y successes (so y specific positions being successes, and the other n-y being failures). But we need the probability of any; these occurrences are disjoint, so we sum their probabilities by multiplying by the number of such potential outcomes, which is n choose y.

Answer 7

f(x) = 1/(b-a)

Answer 8

When sampling from a distribution, as the number of samples grows, the sampling mean will tend towards the expected value of the distribution.

Answer 9

Z = (y - µ)/ð

Answer 10

The number of standard deviations ð that y is above or below the mean µ.

Answer 11

Z follows N(0,1)

Answer 12

It is the CDF of the standard normal distribution Z. It's input is the z-score of your result. It tells us the probability of getting a result with a z-score as low or lower than your result.

Answer 13

Cov(X,Y) = E[XY] - E[X]E[Y]

Answer 14

We can find E[X] by taking the weighted sum of the conditional expectations of X given all values of a variable Y. For example, if Y = Y₁ or Y₂, then E[X] = E[X|Y₁]P(Y₁) + E[X|Y₂]P(Y₂)

Answer 15

E[X] = E[E[X|Y]]

Answer 16

(In brackets): V(A) Cov(A,B) Cov(A,C) Cov(B,A) V(B) Cov(B,C) Cov(C,A) Cov(C,B) V(B)

Answer 17

The data X1, X2, X3...! (This is important!)

Answer 18

It is a *random variable* Which means it has *its own probability distribution*, with E[Ø\_h], V[Ø\_h], etc

Answer 19

Ø and Ø\_h respectively.

Answer 20

To correct for biases that these statistics have in estimating the actual variance and standard deviation of the distribution, respectively. They both tend to slightly underestimate their targets, and this change makes them unbiased estimators.

Answer 21

It has a mean close to the true value Ø; in other words, its bias is low.

Answer 22

It tends to produce similar answers each time; in other words, its variance is low.

Answer 23

MSE(Ø\_h) = E[(Ø\_h - Ø)²] = V(Ø\_h) + bias(Ø\_h)²

Answer 24

Bias(Ø\_h) = E[Ø\_h - Ø] Ø\_h is unbiased iff Bias(Ø\_h) = 0, or if the expected value of Ø\_h is the correct value Ø.

Answer 25

SE(Ø\_h) = sqrt(V[Ø\_h]) It is an idea of the typical error of the estimator, or the typical distance it will be from its mean.

Answer 26

Mean Squared Error, or MSE(Ø\_h)

Answer 27

It describes how likely those a distribution with those parameters was to make that dataset. (I think it is often talked about in the context of a specific family of distributions. So we might say, what is the likelihood of a *normal distribution* with these paramaters, given this dataset?)

Answer 28

Independent and Identically Distributed

Answer 29

It's a Minimim-Variance Unbiased Estimator. So for some parameter Ø, it's the unbiased estimator Ø\_h with the lowest variance out of all the unbiased estimators.

Answer 30

While sometimes we estimate parameters without a suspected distribution, such as distribution mean and variance, we generally more often use an assumed distribution family. (This is mostly my opinion, and also me wanting to remember that when we for example "find the MLE", it generally has quite a bit of structure due to an assumed distribution that we can differentiate/optimize.)

Answer 31

It is the estimator Ø\_hat of Ø that maximizes the likelihood of your data. So, generally for some assumed distribution family such as Exponential Distributions, you try to find an estimator lambda\_hat for parameter lambda that leads to the exponential distribution that was most likely to produce this data.

Answer 32

For L and U, which are random variables based on your observations X_i, P(L \<= Ø \<= U) = 95%. Meaning, when you sample your X_i's and calcuulate L and U, the odds that then end up so L \<= Ø \<= U is 95%.

Answer 33

Correct: "I am 95% confident that my calculated confidence interval [L,U] contains Ø." Incorrect: "There is a 95% chance that Ø is in the interval [L,U]." The latter is incorrect because the *true population parameter* Ø is *not a random variable.* It is a set value that just exists in the world, and it either is in the interval or it isn't; there is no chance involved.

Answer 34

If I compute a high number of 95% confidence intervals, over time, about 95% of them will contain their respective parameters.

Answer 35

A pivot is and expression that is : * A function of the observable R.V.'s (i.e. the observations X_i) * And of the unknown parameter Ø, * But no other unknowns. * And who's distribution does not depend on the unknown Ø. This is an important one!

Answer 36

It can be used to create a confidence interval for Ø.

Answer 37

The Fisher information notes that the sampling distribution of the maximum likelihood parameter estimate will follow a *normal distribution*. This distribution can be calculated and used to *quantify the* *uncertainty of your parameter estimate.*

Answer 38

Given enough sample size, we can find an approximate distribution for the sample mean of the X_i's, but we don't need to know anything about the underlying distribution of X_i! It doesn't need to be of a specific family, and it can be an insane looking distribution, but we can still find an approximate distribution of the sample mean. Using this, we can also find a confidence interval for the sample mean, which is great.

Answer 39

We don't need the CLT when we think the underlying distribution is normal; there are good pivots for estimating both the mean and the variance. The CLT is more useful when the underlying distribution is arbitrary and/or very strange.

Answer 40

If Z is the standard normal N(0,1), z_a is such that P(Z \> z_a) = a Graphically, or verbally: the probability a draw from Z appearing above z_a is a.

Answer 41

The following, which can be similarly written for t dist, chai-squared dist, etc. But it's especially common to use the normal version, due to the CLT and all the great info we have about normals.

Answer 42

Null Hypothesis H_o is the "status quo" or "safe hypothesis". It is the baseline, and we are looking for significant evidence that it is *not* true. For example, when testing whether two groups have different performance on a task, the null hypothesis is that their performance is the same.

Answer 43

The alternative hypothesis an idea that breaks from the "status quo" or "baseline assumption", for which we are looking to see if there is significant evidence. For example, when testing whether two groups have different performance on a task, the alternative hypothesis could be that group A performs better than group B, for example.

Answer 44

The test statistic in a hypothesis test is a function of your observable data which you will use to quantitatively examine your null and alternative hypotheses. For example, when testing whether two groups have different performance on a task, the test statistic might be the difference in mean performances of the 2 groups.

Answer 45

"Reject the null hypothesis in favor of the alternative," and "Fail to reject the null hypothesis."

Answer 46

It is the *predecided* range of (extreme) values of the test statistic in which we will "reject the null hypothesis in favor of the alternative."

Answer 47

It is when we reject the null hypothesis H_o even though it is true.

Answer 48

It is when we fail to reject the null H₀, but the alternative H₁ is true,

Answer 49

Alpha = 0.05

Answer 50

A low value of alpha like 0.001 means that we require very compelling evidence (or very extreme values of our test statistic) in order to reject the null hypothesis. Conversely, a high value like 0.20 means that we have very relaxed and un-stringent requirements for rejecting our null hypothesis.

Answer 51

Once you conduct your experiment and calculate the test statistic, the p-value is the probability of getting results that are *as extreme or more extreme than* your test statistic, *under the assumption that the null hypothesis is true*.

Answer 52

The p-value. (We need to calculate the p-value from the test statistic under the assumption of the null, in order to see how unlikely our result is under the null.)

Answer 53

If say, p-val = 0.10 and alpha = 0.05, then our results are not as extreme as our alpha requires, and so we fail to reject the null hypothesis.

Answer 54

If say, p-val = 0.01 and alpha = 0.05, then our results are more extreme than our alpha requires, and so we reject the null hypothesis in favor of the alternative hypothesis.

Answer 55

1. Every marginal distribution X_iis univariate normal (and every subset of the X_i's has its own multivariate normal joint distribution). 2. Any conditional distribution f(X_j | X_i= x_i) is univariate normal. 3. A pair of variables Xi, Xj is independent iff they are uncorrelated (iff their covariance is 0). 4. All linear combinations of the covariates are univariate normal (unless all of the coefficients are 0, of course).

Answer 56

We reject only if the test statistic is extreme in one of the two directions. For example, if the null is µ = 0, the alternative is µ \> 0.

Answer 57

We reject if the test statistic is extreme in either directions. For example, if the null is µ = 0, the alternative is µ =/= 0, and we reject if the test statistic is extremely high *or* extremely low.

Answer 58

The probability that we correctly reject H₀ when H₁ is true. In otherwords, our ability to avoid type 2 errors.

Answer 59

In classical/frequentist statistics, the parameter Ø is ***constant***. We examine it using estimators Ø\_h, we quantify our uncertainty of its value using confidence intervals, and we test theories using hypothesis tests and p-values.

Answer 60

The parameter Ø is viewed as ***variable***, and we quantify our opinions around its potential values using a prob dist π.

Answer 61

Corr(X,Y) = Cov(X,Y)/sqrt[V(X)V(Y)]

Answer 62

You incorporate using a method looking very similar to Bayes' law. Specifically:

Answer 63

With enough data, the impact of the prior distribution on the posterior distribution tends towards 0.

Answer 64

If you have n\>2 groups, you test the null hypothesis that the means of all the groups are equal, against the alternative that there is some difference among the means.

Answer 65

Analysis of Variance

Answer 66

Using a global F test, which looks at the probability of seeing your observed sample means for all groups under the null assumption that all of the groups' means are equal.

Probability and Statistics Basics Flashcards

(105 cards)