Statistic Flashcards

Question

What is the expected value and its basic properties?

Answer 1

The expected value of a discrete random variable is the probability-weighted average of all its possible values. Its basic properties are: E[X]= Σx_ip_i E[X+Y] = E[X]+E[Y] E[aX+b] = aE[X] + b E[XY] = E[X]E[Y] iff X and Y are independant

Answer 2

Var(X) = E[X²]-(E[X])² = E[(X-u)²]

Answer 3

Geometric: The probability distribution of the number X of Bernoulli trials needed to get one success. Uniform: every one of n values has equal probability 1/n. Binomial: the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments. Bernoulli: the set of possible outcomes of any single experiment that asks a yes–no question. Hypergeometric: Describes the probability of x successes in n draws, without replacement, from a finite population of size N that contains exactly r objects with that feature. Where in each draw is either a success or a failure. Negative Binomial: is the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of failures (denoted r) occurs. Poisson: expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event

Answer 4

the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. pdf: f(x,λ) = λe^-λx x\>= 0 mean = 1/λ Variance: 1/λ²

Answer 5

A normal distribution is a symmetrical distribution sometimes informally called a bell curve (Gaussian). Mean: u Variance: σ² pdf:

Answer 6

The information entropy, often just entropy, is a basic quantity in information theory associated to any random variable, which can be interpreted as the average level of "information", "surprise", or "uncertainty" inherent in the variable's possible outcomes.

Answer 7

**Information theory studies the quantification, storage, and communication of information.** Overview: Information theory studies the transmission, processing, extraction, and utilization of information. Abstractly, information can be thought of as the resolution of uncertainty. In the case of communication of information over a noisy channel, this abstract concept was made concrete in 1948 by Claude Shannon in his paper "A Mathematical Theory of Communication", in which "information" is thought of as a set of possible messages, where the goal is to send these messages over a noisy channel, and then to have the receiver reconstruct the message with low probability of error, in spite of the channel noise.

Answer 8

In probability theory, the central limit theorem (CLT) establishes that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a "bell curve") even if the original variables themselves are not normally distributed. For example let take the sample mean: S_n = (X₁+X₂+...+X_n)/n The sample mean is a random variable. The usefulness of the theorem is that the distribution of √n(Sn − µ) approaches N(0,σ²) regardless of the shape of the distribution of the individual Xi.

Answer 9

A combination is similar to a permutations. However, the order of the selected items does not matter. For example, the arrangements ab and ba are equal in combinations.

Answer 10

A permutation is an ordered combination. Let say we have N item and pick r of them. When repetition is allowed the total number of permutation is: P(n,r) = n^r When repetition is not allowed the total number of permutation is: P(n,r) = n!/(n-r)!

Answer 11

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data. The estimator itself is a random variable.

Answer 12

**The point estimators** yield single-valued results, although this includes the possibility of single vector-valued results and results that can be expressed as a single function. This is in contrast to an **interval estimator**, where the result would be a range of plausible values (or vectors or functions).

Answer 13

Note, i use o to represent the estimator generally it is theta. 1) Error: For a given sample x, the error of the estimator ô is defined as e(x) = ô(x)-o 2)Mean squared error, is the probability-weighted average of the sqaured erros: MSE(ô) = E[(ô(X)-o)²] 3) Variance: It is used to indicate how far, on average, the collection of estimates are from the expected value of the estimates. Keep in mind the estimator is a random variable. Var(ô)=E[(ô-E(ô))²] 4) Bias is the distance between the average of the collection estimates, and the single parameter being estimate. It is defined as: B(ô) = E(ô)-o 5) Relationships among the quantites: MSE(ô)= var(ô) + (B(ô))²

Answer 14

A consistent sequence of estimators is a sequence of estimators that converge in probability to the quantity being estimated as the index (usually the sample size) grows without bound. In other words, increasing the sample size increases the probability of the estimator being close to the population parameter. Mathematically we have:

Answer 15

An asymptotically normal estimator is a consistent estimator whose distribution around the true parameter θ approaches a normal distribution with standard deviation shrinking in proportion to {\displaystyle 1/{\sqrt {n}}} as the sample size n grows. Mathematically we have:

Answer 16

An efficient estimator, is an estimator having the lowest variance. In other word, it extract the optimal amount of information from the data.

Answer 17

Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal.

Answer 18

Parametric statistics is a branch of statistics which assumes that sample data come from a population that can be adequately modeled by a probability distribution that has a fixed set of parameters.

Answer 19

Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distribution-free or having a specified distribution but with the distribution's parameters unspecified. Nonparametric statistics includes both descriptive statistics and statistical inference.

Answer 20

**Parametric probability density estimation** involves selecting a common distribution and estimating the parameters for the density function from a data sample. **Nonparametric probability density estimation** involves using a technique to fit a model to the arbitrary distribution of the data, like kernel density estimation.

Answer 21

The shape of a histogram of most random samples will match a well-known probability distribution. The common distributions are common because they occur again and again in different and sometimes unexpected domains. Once identified, estimate the parameter of the distribution. For example, if it look like a normal we need the mean and variance. To verify if it a good fit we can: 1) Plot the density function and comparing the shape to the histogram. 2) Sample the density function and comparing the generated sample to the real sample. 3) Use a statistical test to confirm the data fits the distribution.

Answer 22

In some cases, a data sample may not resemble a common probability distribution or cannot be easily made to fit the distribution. This is often the case when the data has two peaks (bimodal distribution) or many peaks (multimodal distribution). In this case, parametric density estimation is not feasible and alternative methods can be used that do not use a common distribution. Instead, an algorithm is used to approximate the probability distribution of the data without a pre-defined distribution, referred to as a nonparametric method.

Answer 23

In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample.

Answer 24

In statistics, bootstrapping is any test or metric that relies on random sampling with replacement. Bootstrapping allows assigning measures of accuracy to sample estimates.