Statistics Flashcards

Question

Statistical Inference

Answer 1

practice of forming judgments about the _parameters_ of a population and the reliability of statistical relationships, typically _on the basis of random sampling._

Answer 2

PDF = ![](), ![]() [parameterized](https://en.wikipedia.org/wiki/Statistical_parameter) by two positive [shape parameters](https://en.wikipedia.org/wiki/Shape_parameter), denoted by *α* and *β*, that control the shape of the distribution. applied to model the behavior of [random variables](https://en.wikipedia.org/wiki/Random_variables) limited to intervals of finite length in a wide variety of disciplines. In [Bayesian inference](https://en.wikipedia.org/wiki/Bayesian_inference), the beta distribution is the [conjugate prior probability distribution](https://en.wikipedia.org/wiki/Conjugate_prior_distribution) for the [Bernoulli](https://en.wikipedia.org/wiki/Bernoulli_distribution), [binomial](https://en.wikipedia.org/wiki/Binomial_distribution), [negative binomial](https://en.wikipedia.org/wiki/Negative_binomial_distribution) and [geometric](https://en.wikipedia.org/wiki/Geometric_distribution) distributions. The beta distribution is a suitable model for the random behavior of percentages and proportions.

Answer 3

generalization to multiple variables of Beta distribution

Answer 4

a generalization to multiple dimensions of the [gamma distribution](https://en.wikipedia.org/wiki/Gamma_distribution) These distributions are of great importance in the [estimation of covariance matrices](https://en.wikipedia.org/wiki/Estimation_of_covariance_matrices) in [multivariate statistics](https://en.wikipedia.org/wiki/Multivariate_statistics). In [Bayesian statistics](https://en.wikipedia.org/wiki/Bayesian_inference), the Wishart distribution is the [conjugate prior](https://en.wikipedia.org/wiki/Conjugate_prior) of the [inverse](https://en.wikipedia.org/wiki/Matrix_inverse) [covariance-matrix](https://en.wikipedia.org/wiki/Covariance_matrix) of a [multivariate-normal random-vector](https://en.wikipedia.org/wiki/Multivariate_normal_distribution).

Answer 5

a [parametric](https://en.wikipedia.org/wiki/Parametric_model) set of [probability distributions](https://en.wikipedia.org/wiki/Probability_distribution). chosen for mathematical convenience, based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions Most of the commonly used distributions form an exponential family or subset of an exponential family… Includes: [normal](https://en.wikipedia.org/wiki/Normal_distribution), [exponential](https://en.wikipedia.org/wiki/Exponential_distribution), [gamma](https://en.wikipedia.org/wiki/Gamma_distribution), [chi-squared](https://en.wikipedia.org/wiki/Chi-squared_distribution)[, beta](https://en.wikipedia.org/wiki/Beta_distribution), [Dirichlet](https://en.wikipedia.org/wiki/Dirichlet_distribution), [Bernoulli](https://en.wikipedia.org/wiki/Bernoulli_distribution), [categorical](https://en.wikipedia.org/wiki/Categorical_distribution), [Poisson](https://en.wikipedia.org/wiki/Poisson_distribution), [Wishart](https://en.wikipedia.org/wiki/Wishart_distribution), [inverse Wishart](https://en.wikipedia.org/wiki/Inverse_Wishart_distribution), [geometric,](https://en.wikipedia.org/wiki/Geometric_distribution)[binomial](https://en.wikipedia.org/wiki/Binomial_distribution) (with fixed number of trials), [multinomial](https://en.wikipedia.org/wiki/Multinomial_distribution) (with fixed number of trials), [negative binomial](https://en.wikipedia.org/wiki/Negative_binomial_distribution) (with fixed number of failures) They have properties: 1. have [sufficient statistics](https://en.wikipedia.org/wiki/Sufficient_statistic) that can summarize arbitrary amounts of [independent identically distributed](https://en.wikipedia.org/wiki/Independent_identically_distributed) data using a fixed number of values 2. have [conjugate priors](https://en.wikipedia.org/wiki/Conjugate_prior), an important property in [Bayesian statistics](https://en.wikipedia.org/wiki/Bayesian_statistics). 3. [posterior predictive distribution](https://en.wikipedia.org/wiki/Posterior_predictive_distribution) of an exponential-family random variable with a conjugate prior can always be written in closed form (provided that the [normalizing factor](https://en.wikipedia.org/wiki/Normalizing_factor) of the exponential-family distribution can itself be written in closed form) 4. In the mean-field approximation in [variational Bayes](https://en.wikipedia.org/wiki/Variational_Bayes) (used for approximating the [posterior distribution](https://en.wikipedia.org/wiki/Posterior_distribution) in large [Bayesian networks](https://en.wikipedia.org/wiki/Bayesian_network)), the best approximating posterior distribution of an exponential-family node (a node is a random variable in the context of Bayesian networks) with a conjugate prior is in the same family as the node.

Answer 6

A way to estimate parameter values, which are found such that they maximize the likelihood that the _model_ describes the _data that were actually observed._

Answer 7

Of an observation. The deviation between observed value and true value of a quantity (i.e. population mean)

Answer 8

Difference between observed value and estimated value of quantity of interest (i.e. sample mean)

Answer 9

degrees of freedom of an estimate of a parameter are equal to the number of independent [scores](https://en.wikipedia.org/wiki/Realization_(probability)) that go into the estimate minus the number of parameters used as intermediate steps in the estimation of the parameter itself (most of the time the sample variance has *N* − 1 degrees of freedom, since it is computed from *N* random scores minus the only 1 parameter estimated as intermediate step, which is the sample mean number of [dimensions](https://en.wikipedia.org/wiki/Dimension) of the domain of a [random vector](https://en.wikipedia.org/wiki/Random_vector), or essentially the number of "free" components (how many components need to be known before the vector is fully determined). in the context of [linear models](https://en.wikipedia.org/wiki/Linear_models) ([linear regression](https://en.wikipedia.org/wiki/Linear_regression), [analysis of variance](https://en.wikipedia.org/wiki/Analysis_of_variance)), where certain random vectors are constrained to lie in [linear subspaces](https://en.wikipedia.org/wiki/Linear_subspace), and the number of degrees of freedom is the dimension of the [subspace](https://en.wikipedia.org/wiki/Linear_subspace). The degrees of freedom are also commonly associated with the squared lengths (or "sum of squares" of the coordinates) of such vectors, and the parameters of [chi-squared](https://en.wikipedia.org/wiki/Chi-squared_distribution) and other distributions

Answer 10

n / (n-1) the [degrees of freedom](https://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics)) in the [residuals](https://en.wikipedia.org/wiki/Errors_and_residuals_in_statistics) vector (residuals, not errors, because the population mean is unknown): While there are *n* independent observations in the sample, there are only *n* − 1 independent residuals, as they sum to 0 approach to reduce the bias due to finite sample size. The sum of squares of the distance from samples to the *population* mean will always be bigger than the sum of squares of the distance to the *sample* mean, except when the sample mean happens to be the same as the population mean, in which case the two are equal. sum of squares of the deviations from the *sample* mean is too small to give an unbiased estimate of the population variance when the average of those squares is found. The smaller the sample size, the larger is the difference between the sample variance and the population variance. three [caveats](https://en.wikipedia.org/wiki/Caveat_emptor) to consider regarding Bessel's correction: 1. It does not yield an unbiased estimator of standard *deviation*. 2. The corrected estimator often has a higher [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error) (MSE) than the uncorrected estimator.[^[4]](https://en.wikipedia.org/wiki/Bessel%27s_correction#cite_note-4) Furthermore, there is no population distribution for which it has the minimum MSE because a different scale factor can always be chosen to minimize MSE. 3. It is only necessary when the population mean is unknown (and estimated as the sample mean). In practice, this generally happens.

Answer 11

**bias** (or **bias function**) of an [estimator](https://en.wikipedia.org/wiki/Estimator) is the difference between this estimator's [expected value](https://en.wikipedia.org/wiki/Expected_value) and the [true value](https://en.wikipedia.org/wiki/True_value) of the parameter being estimated. An estimator or decision rule with zero bias is called **unbiased**. Bias can also be measured with respect to the [median](https://en.wikipedia.org/wiki/Median), rather than the mean (expected value), in which case one distinguishes *median*-unbiased from the usual *mean*-unbiasedness property. unbiased estimator is preferable to a biased estimator, although in practice, biased estimators (with generally small bias) are frequently used. When a biased estimator is used, bounds of the bias are calculated. A **biased estimator** may be used for various reasons: 1. because an **unbiased estimator does not exist** without further assumptions about a population; 2. because an **estimator is difficult to compute** (as in [unbiased estimation of standard deviation](https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation)); 3. because an **estimator is median-unbiased but not mean-unbiased** (or the reverse); 4. because a **biased estimator gives a lower value of some** [**loss function**](https://en.wikipedia.org/wiki/Loss_function) (particularly [mean squared error](https://en.wikipedia.org/wiki/Mean_squared_error)) compared with unbiased estimators (notably in [shrinkage estimators](https://en.wikipedia.org/wiki/Shrinkage_estimator)); or 5. because in some cases **being unbiased is too strong a condition**, and the only unbiased estimators are not useful. The **bias** of ![]() relative to ![]() is defined as: ![]() **unbiased** if its bias is equal to zero for all values of parameter *θ*, or equivalently, if the expected value of the estimator matches that of the parameter.

Answer 12

means that with a large number of repeated samples, 95% of such calculated confidence intervals would include the true value of the parameter. In frequentist terms, the parameter is *fixed* (cannot be considered to have a distribution of possible values) and the confidence interval is *random* (as it depends on the random sample).