Probability and Statistics Basics Flashcards

Question

Answer 1

The data X1, X2, X3...! (This is important!)

Answer 2

It is a *random variable* Which means it has *its own probability distribution*, with E[Ø\_h], V[Ø\_h], etc

Answer 3

Ø and Ø\_h respectively.

Answer 4

It has a mean close to the true value Ø; in other words, its bias is low.

Answer 5

It tends to produce similar answers each time; in other words, its variance is low.

Answer 6

MSE(Ø\_h) = E[(Ø\_h - Ø)²] = V(Ø\_h) + bias(Ø\_h)² but only really need that first part

Answer 7

Bias(Ø\_h) = E[Ø\_h - Ø] Ø\_h is unbiased iff Bias(Ø\_h) = 0, or if the expected value of Ø\_h is the correct value Ø.

Answer 8

SE(Ø\_h) = sqrt(V[Ø\_h]) It is an idea of the typical error of the estimator, or the typical distance it will be from its mean.

Answer 9

Mean Squared Error, or MSE(Ø\_h)

Answer 10

It describes how likely those a distribution with those parameters was to make that dataset. (I think it is often talked about in the context of a specific family of distributions. So we might say, what is the likelihood of a *normal distribution* with these paramaters, given this dataset?)

Answer 11

Independent and Identically Distributed

Answer 12

It's a Minimim-Variance Unbiased Estimator. So for some parameter Ø, it's the unbiased estimator Ø\_h with the lowest variance out of all the unbiased estimators.

Answer 13

While sometimes we estimate parameters without a suspected distribution, such as distribution mean and variance, we generally more often use an assumed distribution family. (This is mostly my opinion, and also me wanting to remember that when we for example "find the MLE", it generally has quite a bit of structure due to an assumed distribution that we can differentiate/optimize.)

Answer 14

It is the estimator Ø\_hat of Ø that maximizes the likelihood of your data. So, generally for some assumed distribution family such as Exponential Distributions, you try to find an estimator lambda\_hat for parameter lambda that leads to the exponential distribution that was most likely to produce this data.

Answer 15

Differentiate the likelihood w.r.t. the parameters and set that equal to zero, then solve

Answer 16

For L and U, which are random variables based on your observations X_i, P(L \<= Ø \<= U) = 95%. Meaning, when you sample your X_i's and calcuulate L and U, the odds that then end up so L \<= Ø \<= U is 95%.

Answer 17

Correct: "I am 95% confident that my calculated confidence interval [L,U] contains Ø." Incorrect: "There is a 95% chance that Ø is in the interval [L,U]." The latter is incorrect because the *true population parameter* Ø is *not a random variable.* It is a set value that just exists in the world, and it either is in the interval or it isn't; there is no chance involved.

Answer 18

If I compute a high number of 95% confidence intervals, over time, about 95% of them will contain their respective parameters.

Answer 19

A pivot is and expression that is : * A function of the observable R.V.'s (i.e. the observations X_i) * And of the unknown parameter Ø, * But no other unknowns. * And who's distribution does not depend on the unknown Ø. This is an important one!

Answer 20

Given enough sample size, we can find an approximate distribution for the sample mean of the X_i's, but we don't need to know anything about the underlying distribution of X_i! It doesn't need to be of a specific family, and it can be an insane looking distribution, but we can still find an approximate distribution of the sample mean. Using this, we can also find a confidence interval for the sample mean, which is great.

Answer 21

If Z is the standard normal N(0,1), z_a is such that P(Z \> z_a) = a Graphically, or verbally: the probability a draw from Z appearing above z_a is a.

Answer 22

The following, which can be similarly written for t dist, chai-squared dist, etc. But it's especially common to use the normal version, due to the CLT and all the great info we have about normals.

Answer 23

Null Hypothesis H_o is the "status quo" or "safe hypothesis". It is the baseline, and we are looking for significant evidence that it is *not* true. For example, when testing whether two groups have different performance on a task, the null hypothesis is that their performance is the same.

Answer 24

The alternative hypothesis an idea that breaks from the "status quo" or "baseline assumption", for which we are looking to see if there is significant evidence. For example, when testing whether two groups have different performance on a task, the alternative hypothesis could be that group A performs better than group B, for example.

Answer 25

The test statistic in a hypothesis test is a function of your observable data which you will use to quantitatively examine your null and alternative hypotheses. For example, when testing whether two groups have different performance on a task, the test statistic might be the difference in mean performances of the 2 groups.

Answer 26

"Reject the null hypothesis in favor of the alternative," and "Fail to reject the null hypothesis."

Answer 27

It is the *predecided* range of (extreme) values of the test statistic in which we will "reject the null hypothesis in favor of the alternative."

Answer 28

It is when we reject the null hypothesis H_o even though it is true.

Answer 29

It is when we fail to reject the null H₀, but the alternative H₁ is true,

Answer 30

Alpha = 0.05

Answer 31

A low value of alpha like 0.001 means that we require very compelling evidence (or very extreme values of our test statistic) in order to reject the null hypothesis. Conversely, a high value like 0.20 means that we have very relaxed and un-stringent requirements for rejecting our null hypothesis.

Answer 32

Once you conduct your experiment and calculate the test statistic, the p-value is the probability of getting results that are *as extreme or more extreme than* your test statistic, *under the assumption that the null hypothesis is true*.

Answer 33

The p-value. (We need to calculate the p-value from the test statistic under the assumption of the null, in order to see how unlikely our result is under the null.)

Answer 34

If say, p-val = 0.10 and alpha = 0.05, then our results are not as extreme as our alpha requires, and so we fail to reject the null hypothesis.

Answer 35

If say, p-val = 0.01 and alpha = 0.05, then our results are more extreme than our alpha requires, and so we reject the null hypothesis in favor of the alternative hypothesis.

Answer 36

We reject only if the test statistic is extreme in one of the two directions. For example, if the null is µ = 0, the alternative is µ \> 0.

Answer 37

We reject if the test statistic is extreme in either directions. For example, if the null is µ = 0, the alternative is µ =/= 0, and we reject if the test statistic is extremely high *or* extremely low.

Answer 38

In classical/frequentist statistics, the parameter Ø is ***constant***. We examine it using estimators Ø\_h, we quantify our uncertainty of its value using confidence intervals, and we test theories using hypothesis tests and p-values.

Answer 39

The parameter Ø is viewed as ***variable***, and we quantify our opinions around its potential values using a prob dist π.

Answer 40

You incorporate using a method looking very similar to Bayes' law. Specifically:

Answer 41

With enough data, the impact of the prior distribution on the posterior distribution tends towards 0.

Probability and Statistics Basics Flashcards

(72 cards)