Statistics & Probability Flashcards by Tim Borden

In mathematics, ___ consists of writing a number or another mathematical object as a product of several factors, usually smaller or simpler objects of the same kind.

Factorization or factoring

How well did you know this?

Not at all

Perfectly

Is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of ___ arises in a number of places in abstract algebra (in particular, in the theory of projectors and closure operators) and functional programming (in which it is connected to the property of referential transparency).

The term was introduced by Benjamin Peirce in the context of elements of algebras that remain invariant when raised to a positive integer power, and literally means “(the quality of having) the same power”, from ___ + ___ (same + power).

The natural number 1 is an ___ element with respect to multiplication (since 1×1 = 1), and so is 0 (since 0×0 = 0), but no other natural number is (e.g. 2×2 = 2 does not hold). For the latter reason, multiplication of natural numbers is not an ___ operation.

Idempotence

How well did you know this?

Not at all

Perfectly

Is the inverse function to exponentiation. That means the ___ of a given number x is the exponent to which another fixed number, the base b, must be raised, to produce that number x. In the simplest case, the ___ counts the number of occurrences of the same factor in repeated multiplication

Logarithm

How well did you know this?

Not at all

Perfectly

A ___ is a mathematical curve that describes a smooth periodic oscillation. A ___ is a continuous wave. It is named after the function ___, of which it is the graph. It occurs often in pure and applied mathematics, as well as physics, engineering, signal processing and many other fields.

Sine wave, sinusoid or sinusoidal

How well did you know this?

Not at all

Perfectly

___ is the absence of, or a violation of, symmetry (the property of an object being invariant to a transformation, such as reflection). Symmetry is an important property of both physical and abstract systems and it may be displayed in precise terms or in more aesthetic terms.

The absence of or violation of symmetry that are either expected or desired can have important consequences for a system.

In mathematics, there are no a and b such that a < b and b < a. This form of ___ is an ___ relation.

Asymmetry

How well did you know this?

Not at all

Perfectly

In mathematics, an ___ assigns numbers to functions in a way that can describe displacement, area, volume, and other concepts that arise by combining infinitesimal data. ___ is one of the two main operations of calculus, with its inverse operation, differentiation, being the other.

Integral (Integration)

How well did you know this?

Not at all

Perfectly

A stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.

Roughly speaking, a process satisfies the ___ property if one can make predictions for the future of the process based solely on its present state just as well as one could knowing the process’s full history, hence independently from such history

Markov chain

How well did you know this?

Not at all

Perfectly

In mathematics, the ___ of a positive integer n, denoted by n!, is the product of all positive integers less than or equal to n: n! = n x (n-1) x (n-2) x (n-3) x … x 3 x 2 x 1

For example: 5! = 5 x 4 x 3 x 2 x 1 = 120

Factorial

How well did you know this?

Not at all

Perfectly

In mathematics, a ___ is a multiplicative factor in some term of a polynomial, a series, or any expression; it is usually a number, but may be any expression. In the latter case, the variables appearing in the ___ are often called parameters, and must be clearly distinguished from the other variables.

For example, in 7x^2 - 3xy + 1.5 + y the first two terms respectively have the ___ 7 and −3. The third term 1.5 is a constant ___. The final term does not have any explicitly written ___ factor that would not change the term; the ___ is taken to be 1.

Coefficient

How well did you know this?

Not at all

Perfectly

In mathematics, a ___ is an expression consisting of variables (also called indeterminates) and coefficients, that involves only the operations of addition, subtraction, multiplication, and non-negative integer exponents of variables. An example of a ___ of a single indeterminate, x, is x2 − 4x + 7. An example in three variables is x3 + 2xyz^2 − yz + 1.

Polynomial

How well did you know this?

Not at all

Perfectly

In mathematics, a ___ is a polynomial which is the sum of two monomials. A ___ in a single indeterminate (also known as a univariate ___) can be written in the form ax^m - bx^n where a and b are numbers, and m and n are distinct nonnegative integers and x is a symbol which is called an indeterminate or, for historical reasons, a variable.

Binomial

How well did you know this?

Not at all

Perfectly

In mathematics, a ___ is, roughly speaking, a polynomial which has only one term.

Monomial

How well did you know this?

Not at all

Perfectly

___ are useful ways to make sense of and tap into the logic and intuition of combinatoric identities

Story proofs

How well did you know this?

Not at all

Perfectly

___ is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The ___ of an event is a number between 0 and 1, where, roughly speaking, 0 indicates impossibility of the event and 1 indicates certainty

Mathematics is the logic of certainty; ___ is the logic of uncertainty

Probability

How well did you know this?

Not at all

Perfectly

In mathematics, a ___ is a well-defined collection of distinct objects, considered as an object in its own right. For example, the numbers 2, 4, and 6 are distinct objects when considered separately, but when they are considered collectively they form a single ___ of size three, written {2, 4, 6}. The concept of a ___ is one of the most fundamental in mathematics. Developed at the end of the 19th century, ___ theory is now a ubiquitous part of mathematics, and can be used as a foundation from which nearly all of mathematics can be derived.

Set

How well did you know this?

Not at all

Perfectly

___ is a branch of mathematical logic that studies ___, which informally are collections of objects. Although any type of object can be collected into a ___, ___ is applied most often to objects that are relevant to mathematics. The language of ___ can be used to define nearly all mathematical objects.

Set theory

How well did you know this?

Not at all

Perfectly

In probability theory, the ___ of an experiment or random trial is the set of all possible outcomes or results of that experiment. A ___ is usually denoted using set notation, and the possible ordered outcomes are listed as elements in the set. It is common to refer to a ___ by the labels S, Ω, or U (for “universal set”). The elements of a ___ may be numbers, words, letters, or symbols. They can also be finite, countably infinite, or uncountably infinite.

For example, if the experiment is tossing a coin, the ___ is typically the set {head, tail}, commonly written {H, T}. For tossing two coins, the corresponding ___ would be {(head,head), (head,tail), (tail,head), (tail,tail)}, commonly written {HH, HT, TH, TT}. If the ___ is unordered, it becomes {{head,head}, {head,tail}, {tail,tail}}.

Sample space, also called sample description space, possibility space or event space

How well did you know this?

Not at all

Perfectly

In set theory, the ___ of a collection of sets is the set of all elements in the collection. It is one of the fundamental operations through which sets can be combined and related to each other.

Union (denoted by ∪)

How well did you know this?

Not at all

Perfectly

In set theory, the ___ of two sets A and B, is the set containing all elements of A that also belong to B (or equivalently, all elements of B that also belong to A), and nothing else.

Intersection (denoted by A ∩ B)

How well did you know this?

Not at all

Perfectly

In set theory, the ___ of a set A refers to elements not in A. When all sets under consideration are considered to be subsets of a given set U, the absolute ___ of A is the set of elements in U but not in A. The relative ___ of A with respect to a set B, also termed the difference of sets A and B, written B \ A, is the set of elements in B but not in A.

Complement

How well did you know this?

Not at all

Perfectly

___ is any of several theories of sets used in the discussion of the foundations of mathematics. Unlike axiomatic set theories, which are defined using formal logic, ___ is defined informally, in natural language. It describes the aspects of mathematical sets familiar in discrete mathematics (for example Venn diagrams and symbolic reasoning about their Boolean algebra), and suffices for the everyday use of set theory concepts in contemporary mathematics.

Naïve set theory

How well did you know this?

Not at all

Perfectly

In combinatorics, the ___ is a basic counting principle (a.k.a. the fundamental principle of counting). Stated simply, it is the idea that if there are A ways of doing something and B ways of doing another thing, then there are A x B ways of performing both actions.

Multiplication rule, rule of product or multiplication principle

How well did you know this?

Not at all

Perfectly

___ is when a sampling unit is drawn from a finite population and is returned to that population, after its characteristic(s) have been recorded, before the next unit is drawn.

Sampling with replacement

How well did you know this?

Not at all

Perfectly

In ___, each sample unit of the population has only one chance to be selected in the sample. For example, if one draws a simple random sample such that no unit occurs more than one time in the sample.

Sampling without replacement, also know as dependent events.

How well did you know this?

Not at all

Perfectly

In probability theory, the ___ concerns the probability that, in a set of n randomly chosen people, some pair of them will have the same ___. By the pigeonhole principle, the probability reaches 100% when the number of people reaches 367 (since there are only 366 possible ___, including February 29). However, 99.9% probability is reached with just 70 people, and 50% probability with 23 people where 1-(365!/(365-23)!/365^23). These conclusions are based on the assumption that each day of the year (excluding February 29) is equally probable for a ___.

Birthday problem or birthday paradox

___ is an interpretation of probability; it defines an event's probability as the limit of its relative ___ in many trials. Probabilities can be found (in principle) by a repeatable objective process (and are thus ideally devoid of opinion). This interpretation supports the statistical needs of many experimental scientists and pollsters. It does not support all needs, however; gamblers typically require estimates of the odds without experiments. The development of the ___ account was motivated by the problems and paradoxes of the previously dominant viewpoint, the classical interpretation. In the classical interpretation, probability was defined in terms of the principle of indifference, based on the natural symmetry of a problem, so, e.g. the probabilities of dice games arise from the natural symmetric 6-sidedness of the cube. This classical interpretation stumbled at any statistical problem that has no natural symmetry for reasoning.

Frequentist probability or frequentism

___ is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation representing a state of knowledge or as quantification of a personal belief. The ___ interpretation of probability can be seen as an extension of propositional logic that enables reasoning with hypotheses, that is to say, with propositions whose truth or falsity is unknown. In the ___ view, a probability is assigned to a hypothesis, whereas under frequentist inference, a hypothesis is typically tested without being assigned a probability. ___ probability belongs to the category of evidential probabilities; to evaluate the probability of a hypothesis, the ___ probabilist specifies a prior probability. This, in turn, is then updated to a posterior probability in the light of new, relevant data (evidence). The ___ interpretation provides a standard set of procedures and formulae to perform this calculation.

Bayesian probability

The ___ of a function of a real variable measures the sensitivity to change of the function value (output value) with respect to a change in its argument (input value). ___ are a fundamental tool of calculus. For example, the ___ of the position of a moving object with respect to time is the object's velocity: this measures how quickly the position of the object changes when time advances.

Derivative (differentiation)

___ is the property of a mathematical relationship or function which means that it can be graphically represented as a straight line. Examples are the relationship of voltage and current across a resistor (Ohm's law), or the mass and weight of an object. Proportionality implies ___, but ___ does not imply proportionality.

Linearity

In mathematics, two varying quantities are said to be in a relation of ___, if they are multiplicatively connected to a constant, that is, when either their ratio or their product yields a constant. The value of this constant is called the coefficient of ___ or ___ constant. If the ratio (y/x) of two variables (x and y) is equal to a constant (k = y/x), then the variable in the numerator of the ratio (y) is the product of the other variable and the constant (y = k⋅x). In this case y is said to be directly ___ to x with ___ constant k. Equivalently one may write x = 1/k⋅y, that is, x is directly ___ to y with ___ constant 1/k (= x/y). If the term ___ is connected to two variables without further qualification, generally direct ___ can be assumed. If the product of two variables (x⋅y) is equal to a constant (k = x⋅y), then the two are said to be inversely ___ to each other with the ___ constant k. Equivalently, both variables are directly ___ to the reciprocal of the respective other with ___ constant k (x = k⋅1/y and y = k⋅1/x).

Proportionality

In mathematics, a (real) ___ is a set of real numbers lying between two numbers, the extremities of the ___. For example, the set of numbers x satisfying 0 ≤ x ≤ 1 is an ___ which contains 0, 1 and all numbers in between. Other examples of ___ are the set of real numbers R, the set of negative real numbers, and the empty set.

Interval

In statistics and quantitative research methodology, a ___ is a set of individuals or objects collected or selected from a statistical population by a defined procedure. For a ___ to be “good,” it must possess two qualities: it must be large enough to be statistically significant and it must be random.

Sample (n)

In statistics, ___ is a bias in which a sample is collected in such a way that some members of the intended population have a lower sampling probability than others. It results in a ___ sample, a non-random sample of a population (or non-human factors) in which all individuals, or instances, were not equally likely to have been selected. If this is not accounted for, results can be erroneously attributed to the phenomenon under study rather than to the method of sampling.

Sampling bias

In statistics, ___ is a method of sampling from a population which can be partitioned into subpopulations. In statistical surveys, when subpopulations within an overall population vary, it could be advantageous to sample each subpopulation independently. ___ is the process of dividing members of the population into homogeneous subgroups before sampling. The ___ should define a partition of the population. That is, it should be collectively exhaustive and mutually exclusive: every element in the population must be assigned to one and only one ___. Then simple random sampling or systematic sampling is applied within each ___. The objective is to improve the precision of the sample by reducing sampling error. It can produce a weighted mean that has less variability than the arithmetic mean of a simple random sample of the population.

Stratified (strata, stratum) sampling

A measure of central tendency. For a data set, the arithmetic ___, also called the mathematical expectation or average, is the central value of a discrete set of numbers: specifically, the sum of the values divided by the number of values. If the data set were based on a series of observations obtained by sampling from a statistical population, the arithmetic ___ is the sample ___ to distinguish it from the ___ of the underlying distribution, the population ___ (denoted mu )

Mean

A measure of central tendency. In statistics and probability theory, the ___ is the value separating the higher half from the lower half of a data sample, a population or a probability distribution. For a data set, it may be thought of as the "middle" value. For example, the basic advantage of the ___ in describing data compared to the mean (often simply described as the "average") is that it is not skewed so much by a small proportion of extremely large or small values, and so it may give a better idea of a "typical" value. For example, in understanding statistics like household income or assets, which vary greatly, the mean may be skewed by a small number of extremely high or low values. ___ income, for example, may be a better way to suggest what a "typical" income is. Because of this, the ___ is of central importance in robust statistics, as it is the most resistant statistic, having a breakdown point of 50%: so long as no more than half the data are contaminated, the ___ will not give an arbitrarily large or small result.

Median

A measure of central tendency. The ___ of a set of data values is the value that appears most often. If X is a discrete random variable, the ___ is the value x (i.e, X = x) at which the probability mass function takes its maximum value. In other words, it is the value that is most likely to be sampled. Like the statistical mean and median, the ___ is a way of expressing, in a (usually) single number, important information about a random variable or a population. The numerical value of the ___ is the same as that of the mean and median in a normal distribution, and it may be very different in highly skewed distributions. The ___ is not necessarily unique to a given discrete distribution, since the probability mass function may take the same maximum value at several points x1, x2, etc. The most extreme case occurs in uniform distributions, where all values occur equally frequently.

Mode

A measure of dispersion. In statistics, the ___ is a measure of the amount of variation or dispersion of a set of values. A low ___ indicates that the values tend to be close to the mean (also called the expected value) of the set, while a high ___ indicates that the values are spread out over a wider range. ___ is most commonly represented in mathematical texts and equations by the lower case Greek letter sigma σ, for the population ___, or the Latin letter s, for the sample ___. The ___ of a random variable, statistical population, data set, or probability distribution is the square root of its variance. It is algebraically simpler, though in practice less robust, than the average absolute deviation. A useful property of the ___ is that, unlike the variance, it is expressed in the same units as the data. The ___ of a random variable, sample, statistical population, data set, or probability distribution is the square root of its variance.

Standard deviation, abbreviated to SD

The ___ of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the parameter or the statistic is the mean, it is called the ___ of the mean (SEM). The sampling distribution of a population mean is generated by repeated sampling and recording of the means obtained. This forms a distribution of different means, and this distribution has its own mean and variance. Mathematically, the variance of the sampling distribution obtained is equal to the variance of the population divided by the sample size. This is because as the sample size increases, sample means cluster more closely around the population mean. Therefore, the relationship between the ___ and the standard deviation is such that, for a given sample size, the ___ equals the standard deviation divided by the square root of the sample size. In other words, the ___ of the mean is a measure of the dispersion of sample means around the population mean.

Standard error (SE)

For any nonnegative integers k and n, the ___ (n over k) , read as '' n choose k '', is the number of subsets of size k for a set of size n.

Binomial coefficient

___ is the study of mathematical structures that are fundamentally ___ rather than continuous. In contrast to real numbers that have the property of varying "smoothly", the objects studied in ___ – such as integers, graphs, and statements in logic – do not vary smoothly in this way, but have distinct, separated values. ___ therefore excludes topics in "continuous mathematics" such as calculus or Euclidean geometry. ___ objects can often be enumerated by integers. More formally, ___ has been characterized as the branch of mathematics dealing with countable sets (finite sets or sets with the same cardinality as the natural numbers). However, there is no exact definition of the term "___." Indeed, ___ is described less by what is included than by what is excluded: continuously varying quantities and related notions.

Discrete mathematics

In combinatorics (combinatorial mathematics), the ___ is a counting technique which generalizes the familiar method of obtaining the number of elements in the union of two finite sets; symbolically expressed as |A union B| = |A| + |B| - |A intersect B| where A and B are two finite sets and |S| indicates the cardinality of a set S (which may be considered as the number of elements of the set, if the set is finite). The formula expresses the fact that the sum of the sizes of the two sets may be too large since some elements may be counted twice. The double-counted elements are those in the intersection of the two sets and the count is corrected by subtracting the size of the intersection.

Inclusion–exclusion principle

In statistics, ___ is any statistical relationship, whether causal or not, between two random variables or bivariate data. In the broadest sense ___ is any statistical association, though it commonly refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the ___ between the physical statures of parents and their offspring, and the ___ between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

Correlation or dependence

Mathematical notation uses a symbol that compactly represents summation of many similar terms: the summation symbol, an enlarged form of the upright capital Greek letter ___.

Capital-sigma notation

The product of a sequence of factors can be written with the product symbol, which derives from the capital letter ___ in the Greek alphabet

Capital pi notation

In mathematics, the ___ of a real number x, denoted |x|, is the non-negative value of x without regard to its sign. Namely, |x| = x if x is positive, and |x| = −x if x is negative (in which case −x is positive), and |0| = 0.

Absolute value or modulus

In mathematics, the ___ of a set is a measure of the "number of elements of the set". For example, the set A={2,4,6} contains 3 elements, and therefore A has a ___ of 3. Beginning in the late 19th century, this concept was generalized to infinite sets, allowing to distinguish several stages of infinity, and to perform arithmetic on them. There are two approaches to ___ – one which compares sets directly using bijections and injections, and another which uses ___ numbers. The ___ of a set is also called its size, when no confusion with other notions of size is possible.

Cardinality

In mathematics, a ___ is a relation between sets that associates to every element of a first set exactly one element of the second set. Typical examples are ___ from integers to integers or from the real numbers to real numbers. ___ were originally the idealization of how a varying quantity depends on another quantity. For example, the position of a planet is a ___ of time. Historically, the concept was elaborated with the infinitesimal calculus at the end of the 17th century, and, until the 19th century, the ___ that were considered were differentiable (that is, they had a high degree of regularity). The concept of ___ was formalized at the end of the 19th century in terms of set theory, and this greatly enlarged the domains of application of the concept. A ___ is a process or a relation that associates each element x of a set X, the domain of the ___, to a single element y of another set Y (possibly the same set), the codomain of the ___. If the ___ is called f, this relation is denoted y = f (x) (which is spoken aloud as f of x), the element x is the argument or input of the ___, and y is the value of the ___, the output, or the image of x by f. The symbol that is used for representing the input is the variable of the ___ (one often says that f is a ___ of the variable x).

Function

In mathematics, a ___ is a value of a continuous quantity that can represent a distance along a line. The adjective ___ in this context was introduced in the 17th century by René Descartes, who distinguished between ___ and imaginary roots of polynomials. The ___ include all the rational numbers, such as the integer −5 and the fraction 4/3, and all the irrational numbers, such as √2 (1.41421356..., the square root of 2, an irrational algebraic number). Included within the irrationals are the transcendental numbers, such as π (3.14159265...). In addition to measuring distance, ___ can be used to measure quantities such as time, mass, energy, velocity, and many more.

Real number

In mathematics, the ___ are all the real numbers which are not rational numbers, the latter being the numbers constructed from ratios (or fractions) of integers. When the ratio of lengths of two line segments is an ___ number, the line segments are also described as being incommensurable, meaning that they share no "measure" in common, that is, there is no length ("the measure"), no matter how short, that could be used to express the lengths of both of the two given segments as integer multiples of itself.

Irrational numbers

In mathematics, a ___ is a complex number that is not an algebraic number—that is, not a root (i.e., solution) of a nonzero polynomial equation with integer coefficients. The best-known ___ are π and e.

Transcendental number

___ is a type of bias that occurs in ___ academic research. It occurs when the outcome of an experiment or research study influences the decision whether to ___ or otherwise distribute it. ___ only results that show a significant finding disturbs the balance of findings, and inserts bias in favor of positive results.

Publication or significance bias

___ are non-numerical variables. Their values aren't represented with numbers because their values are represented with words.

Categorical variables

__ are numerical variables.

Quantitative variables

The lowercase letter ___ (μ) is used as a special symbol in many academic fields.

A measure of dispersion. In probability theory and statistics, ___ is the expectation of the squared deviation of a random variable from its mean. Informally, it measures how far a set of numbers is spread out from their average value. It can be measured as the population ___ (sigma squared) or the sample ___ (capital S squared), which can be a biased sample ___ or an unbiased sample ___. The ___, is defined as the sum of the squared distances of each term in the distribution from the mean (μ), divided by the number of terms in the distribution (N).

Variance

A measure of dispersion. In statistics, the ___ of a set of data is the difference between the largest and smallest values. It can give you a rough idea of how the outcome of the data set will be before you look at it actually. Difference here is specific, the ___ of a set of data is the result of subtracting the smallest value from largest value.

Range

In statistics, an ___ is a data point that differs significantly from other observations. An ___ may be due to variability in the measurement or it may indicate experimental error; the latter are sometimes excluded from the data set. An ___ can cause serious problems in statistical analyses. The interquartile range is often used to find ___ in data. ___ here are defined as observations that fall below Q1 − 1.5 IQR or above Q3 + 1.5 IQR.

Outlier

In statistics, a ___ is a set of similar items or events which is of interest for some question or experiment. A statistical ___ can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hypothetical and potentially infinite group of objects conceived as a generalization from experience (e.g. the set of all possible hands in a game of poker). A common aim of statistical analysis is to produce information about some chosen ___.

Population (N)

In probability theory and statistics, ___ describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if the risk of developing health problems is known to increase with age, ___ allows the risk to an individual of a known age to be assessed more accurately than simply assuming that the individual is typical of the population as a whole. ___ is stated mathematically as the following equation: P(A|B) = P(B|A) * P(A) / P(B) or posterior probability = likelihood * marginal likelihood / prior probability.

Bayes' theorem (alternatively Bayes's law or Bayes's rule)

___ is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available.

Bayesian inference

In probability theory and statistics, ___ is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The ___ value can be positive, zero, negative, or undefined. For a unimodal distribution, negative ___ commonly indicates that the tail is on the left side of the distribution, and positive ___ indicates that the tail is on the right. In cases where one tail is long but the other tail is fat, ___ does not obey a simple rule. For example, a zero value means that the tails on both sides of the mean balance out overall; this is the case for a symmetric distribution, but can also be true for an asymmetric distribution where one tail is long and thin, and the other is short but fat.

Skewness

In descriptive statistics, the ___, is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles, ___ = Q3 − Q1. In other words, the ___ is the first quartile subtracted from the third quartile; these quartiles can be clearly seen on a box plot on the data. It is a trimmed estimator, defined as the 25% trimmed range, and is a commonly used robust measure of scale.

Interquartile range (IQR), also called the midspread, middle 50%, or H‑spread

In statistics, the ___, is a shorthand used to remember the percentage of values that lie within a band around the mean in a normal distribution with a width of two, four and six standard deviations, respectively; ___ of the values lie within one, two and three standard deviations of the mean, respectively.

68–95–99.7 rule, also known as the empirical rule (more precisely, 68.27%, 95.45% and 99.73%)

___ is the discipline that concerns the collection, organization, analysis, interpretation and presentation of data. In applying ___ to a scientific, industrial, or social problem, it is conventional to begin with a ___ population or a ___ model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". ___ deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

Statistics

In probability theory, the ___ is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value (theoretical/classical probability) and will tend to become closer to the expected value as more trials are performed.

Law of large numbers (LLN)

The ___ of an event is the ratio of the number of outcomes in which a specified event occurs to the total number of trials, not in a theoretical sample space but in an actual experiment. In a more general sense, ___ estimates probabilities from experience and observation.

Empirical probability, relative frequency, or experimental probability

The ___ is the ratio of the number of cases favorable to it, to the number of all cases possible when nothing leads us to expect that any one of these cases should occur more than any other, which renders them, for us, equally possible.

Classical definition or interpretation of probability

In statistics, ___ is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the mean have positive ___, while those below the mean have negative ___. It is calculated by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation. This process of converting a raw score into a ___ is called standardizing or normalizing (however, "normalizing" can refer to many types of ratios).

The standard score or the z-score

In combinatorics, the ___ is a basic counting principle. Stated simply, it is the idea that if we have A ways of doing something and B ways of doing another thing and we can not do both at the same time, then there are A + B ways to choose one of the actions.

Rule of sum, addition rule or addition principle

In logic and probability theory, two events (or propositions) are ___ if they cannot both occur at the same time. A clear example is the set of outcomes of a single coin toss, which can result in either heads or tails, but not both.

Mutually exclusive or disjoint

In probability theory, the ___ of a random variable X, denoted E(X) or E[X], is a generalization of the weighted average, and is intuitively the arithmetic mean of a large number of independent realizations of X. ___ is also a key concept in economics, finance, and many other subjects.

Expected value also known as the expectation, mathematical expectation, mean, average, or first moment

___ is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are ___ if the occurrence of one does not affect the probability of occurrence of the other (equivalently, does not affect the odds). Similarly, two random variables are ___ if the realization of one does not affect the probability distribution of the other.

Independence, independent, statistically independent, or stochastically independent

In probability theory and statistics, the ___ with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success/yes/true/one (with probability p) or failure/no/false/zero (with probability q = 1 − p). A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the ___ is a Bernoulli distribution. The ___ is the basis for the popular binomial test of statistical significance.

Binomial distribution

In mathematics, a ___ of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "___" also refers to the act or process of changing the linear order of an ordered set. ___ differ from combinations, which are selections of some members of a set regardless of order. The exact number of ___ is 3! = 3 x 2 x 1 = 6. The number gets extremely large as the number of items (n) goes up. In a similar manner, the number of arrangements of r items from n objects is consider a partial ___. It is written as nPr (which reads "n ___ r"), and is equal to the number n(n-1) ... (n-r+1), also written as n!/(n-r)!

Permutation, or permute

In mathematics, a ___ is a selection of items from a collection, such that the order of selection does not matter (unlike permutations). For example, given three fruits, say an apple, an orange and a pear, there are three ___ of two that can be drawn from this set: an apple and a pear; an apple and an orange; or a pear and an orange. More formally, a k-___ of a set S is a subset of k distinct elements of S. If the set has n elements, the number of k-___ is equal to the binomial coefficient which can be written using factorials as n!/k!(n-k)!

Combination

___, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. In graph form, ___ will appear as a bell curve. For a ___, 68% of the observations are within +/- one standard deviation of the mean, 95% are within +/- two standard deviations, and 99.7% are within +- three standard deviations.

Normal distribution, also known as the Gaussian distribution

In probability theory, the ___ establishes that, in many situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a bell curve) even if the original variables themselves are not normally distributed. The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions.

Central limit theorem (CLT)

The ___, is a classical problem in probability theory. One of the famous problems that motivated the beginnings of modern probability theory in the 17th century, it led Blaise Pascal to the first explicit reasoning about what today is known as an expected value. The problem concerns a game of chance with two players who have equal chances of winning each round. The players contribute equally to a prize pot, and agree in advance that the first player to have won a certain number of rounds will collect the entire prize. Now suppose that the game is interrupted by external circumstances before either player has achieved victory. How does one then divide the pot fairly? It is tacitly understood that the division should depend somehow on the number of rounds won by each player, such that a player who is close to winning will get a larger part of the pot. But the problem is not merely one of calculation; it also involves deciding what a "fair" division actually is.

Problem of points, also called the problem of division of the stakes

The number __, is a mathematical constant approximately equal to 2.71828, and can be characterized in many ways. It is the base of the natural logarithm. It is the limit of (1 + 1/n)^n as n approaches infinity, an expression that arises in the study of compound interest. It can also be calculated as the sum of the infinite series.

e, known as Euler's number

___ is the bias introduced by the selection of individuals, groups or data for analysis in such a way that proper randomization is not achieved, thereby ensuring that the sample obtained is not representative of the population intended to be analyzed. The phrase ___ most often refers to the distortion of a statistical analysis, resulting from the method of collecting samples. If the ___ is not taken into account, then some conclusions of the study may be false.

Selection bias, sometimes referred to as the selection effect.

___ is a type of non-sampling error that occurs when there is not a one-to-one correspondence between the target population and the sampling frame from which a sample is drawn. This can bias estimates calculated using survey data. For example, a researcher may wish to study the opinions of registered voters (target population) by calling residences listed in a telephone directory (sampling frame). Under___ may occur if not all voters are listed in the phone directory. Over___ could occur if some voters have more than one listed phone number. Bias could also occur if some phone numbers listed in the directory do not belong to registered voters. In this example, under___, over___, and bias due to inclusion of unregistered voters in the sampling frame are examples of ___.

Coverage error

___ is a general term for a wide range of tendencies for participants to respond inaccurately or falsely to questions. These biases are prevalent in research involving participant self-report, such as structured interviews or surveys. ___ can have a large impact on the validity of questionnaires or surveys.

Response bias

___ is a phenomenon in which the results of elections, studies, polls, etc. become non-representative because the participants disproportionately possess certain traits which affect the outcome. These traits mean the sample is systematically different from the target population, potentially resulting in biased estimates. For instance, a study found that those who refused to answer a survey on AIDS tended to be "older, attend church more often, are less likely to believe in the confidentiality of surveys, and have lower sexual self disclosure." ___ can be a problem in longitudinal research due to attrition during the study.

Participation bias or non-response bias

In statistics, ___ is the phenomenon that arises if a sample point of a random variable is extreme (nearly an outlier), a future point will be closer to the mean or average on further measurements. To avoid making incorrect inferences, ___ must be considered when designing scientific experiments and interpreting data.

Regression toward the mean (or regression to the mean). Historically, was also called reversion to the mean and reversion to mediocrity.

In probability and statistics, a ___ is described informally as a variable whose values depend on outcomes of a random phenomenon. The formal mathematical treatment of ___ is a topic in probability theory. In that context, a ___ is understood as a measurable function defined on a probability space that maps from the sample space to the real numbers.

Random variable, random quantity, aleatory variable, or stochastic variable

When the range of X is countable (distinct/separate values), the random variable is called a ___ random variable and its distribution is a ___ probability distribution, i.e. can be described by a probability mass function that assigns a probability to each value in the range of X.

Discrete random variable

In probability theory and statistics, the ___, named after Swiss mathematician Jacob ___, is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with probability q=1-p. Less formally, it can be thought of as a model for the set of possible outcomes of any single experiment that asks a yes–no question. Such questions lead to outcomes that are boolean-valued: a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q. It can be used to represent a (possibly biased) coin toss where 1 and 0 would represent "heads" and "tails" (or vice versa), respectively, and p would be the probability of the coin landing on heads or tails, respectively. In particular, unfair coins would have p != 1/2.

Bernoulli distribution

In probability theory and statistics, the ___, named after French mathematician Siméon Denis ___, is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. The ___ can also be used for the number of events in other specified intervals such as distance, area or volume. For instance, an individual keeping track of the amount of mail they receive each day may notice that they receive an average number of 4 letters per day. If receiving any particular piece of mail does not affect the arrival times of future pieces of mail, i.e., if pieces of mail from a wide range of sources arrive independently of one another, then a reasonable assumption is that the number of pieces of mail received in a day obeys a ___. Other examples that may follow a ___ include the number of phone calls received by a call center per hour and the number of decay events per second from a radioactive source.

Poisson distribution

The ___ mean or ___ mean and the ___ covariance are statistics computed from a collection (the ___) of data on one or more random variables. The ___ mean and ___ covariance are estimators of the population mean and population covariance, where the term population refers to the set from which the ___ was taken.

Sample mean or empirical mean and the sample covariance

In fields such as epidemiology, social sciences, psychology and statistics, an ___ draws inferences from a sample to a population where the independent variable is not under the control of the researcher because of ethical concerns or logistical constraints.

Observational study

An ___ is a procedure carried out to support, refute, or validate a hypothesis. ___ provide insight into cause-and-effect by demonstrating what outcome occurs when a particular factor is manipulated. ___ vary greatly in goal and scale, but always rely on repeatable procedure and logical analysis of the results.

Experimental study

In statistics, a ___ is the probability distribution of a given random-sample-based statistic. If an arbitrarily large number of samples, each involving multiple observations (data points), were separately used in order to compute one value of a statistic (such as, for example, the sample mean or sample variance) for each sample, then the ___ is the probability distribution of the values that the statistic takes on. In many contexts, only one sample is observed, but the ___ can be found theoretically. ___ are important in statistics because they provide a major simplification en route to statistical inference. More specifically, they allow analytical considerations to be based on the probability distribution of a statistic, rather than on the joint probability distribution of all the individual sample values.

Sampling distribution or finite-sample distribution

___ is a sampling plan used when mutually homogeneous yet internally heterogeneous groupings are evident in a statistical population. It is often used in marketing research. In this sampling plan, the total population is divided into these groups (known as ___) and a simple random sample of the groups is selected. The elements in each ___ are then sampled. If all elements in each sampled ___ are sampled, then this is referred to as a "one-stage" ___ plan. If a simple random subsample of elements is selected within each of these groups, this is referred to as a "two-stage" ___ plan. A common motivation for ___ is to reduce the total number of interviews and costs given the desired accuracy. For a fixed sample size, the expected random error is smaller when most of the variation in the population is present internally within the groups, and not between the groups.

Cluster sampling

In statistics, a ___ is a subset of individuals (a sample) chosen from a larger set (a population). Each individual is chosen randomly and entirely by chance, such that each individual has the same probability of being chosen at any stage during the sampling process, and each subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals. A ___ is an unbiased surveying technique.

Simple random sample

___ is a statistical method involving the selection of elements from an ordered sampling frame. The most common form of ___ is an equiprobability method. In this approach, progression through the list is treated circularly, with a return to the top once the end of the list is passed. The sampling starts by selecting an element from the list at random and then every kth element in the frame is selected, where k, is the sampling interval (sometimes known as the skip).

Systematic sampling

___ is the process of using data analysis to deduce properties of an underlying distribution of probability. ___ analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population. ___ can be contrasted with descriptive statistics. Descriptive statistics is solely concerned with properties of the observed data, and it does not rest on the assumption that the data come from a larger population. In machine learning, the term ___ is sometimes used instead to mean "make a prediction, by evaluating an already trained model"; in this context deducing properties of the model is referred to as training or learning (rather than ___), and using a model for prediction is referred to as ___ (instead of prediction).

Statistical inference or Inferential statistic

In statistics, a ___ is a type of estimate computed from the statistics of the observed data. This proposes a range of plausible values for an unknown parameter (for example, the mean). The interval has an associated ___ level that the true parameter is in the proposed range. Given observations x1...xn and a ___ level y, a valid ___ has a probability y of containing the true underlying parameter. The level of ___ can be chosen by the investigator. In general terms, a ___ for an unknown parameter is based on sampling the distribution of a corresponding estimator.

Confidence interval (CI)

In statistics, ___ is the use of sample data to calculate an interval of possible values of an unknown population parameter; this is in contrast to point estimation, which gives a single value. Jerzy Neyman (1937) identified ___ as distinct from point estimation ("estimation by unique estimate"). In doing so, he recognized that then-recent work quoting results in the form of an estimate plus-or-minus a standard deviation indicated that ___ was actually the problem statisticians really had in mind.

Interval estimation ("estimation by interval")

In statistics, ___ involves the use of sample data to calculate a single value (known as a ___ since it identifies a point in some parameter space) which is to serve as a "best guess" or "best estimate" of an unknown population parameter (for example, the population mean). ___ can be contrasted with interval estimation: such interval estimates are typically either confidence intervals, in the case of frequentist inference, or credible intervals, in the case of Bayesian inference.

Point estimation

A ___, is a parameter that describes a percentage value associated with a population. For example, the 2010 United States Census showed that 83.7% of the American Population was identified as not being Hispanic or Latino. The value of .837 is a ___. In general, the ___ and other population parameters are unknown. A census can be conducted in order to determine the actual value of a population parameter, but often a census is not practical due to its costs and time consumption.

Population proportion, generally denoted by P or the Greek letter pi

In statistical hypothesis testing, a result has ___ when it is very unlikely to have occurred given the null hypothesis. More precisely, a study's defined ___, is the probability of the study rejecting the null hypothesis, given that the null hypothesis was assumed to be true; and the p-value of a result, is the probability of obtaining a result at least as extreme, given that the null hypothesis is true. The result is statistically ___, by the standards of the study, when p is less than or equal to alpha. The ___ for a study is chosen before data collection, and is typically set to 5% or much lower—depending on the field of study.

Significance level, denoted by the Greek letter alpha, also known as the alpha value

In inferential statistics, the ___ is a general statement or default position that there is no difference between two measured phenomena or that two samples derive from the same general population. Testing (rejecting or failing to reject) the ___ —and thus concluding that there are (or there are not) grounds for believing that there is a relationship between two phenomena (e.g., that a potential treatment has a measurable effect)—is a central task in the modern practice of science; the field of statistics, more specifically hypothesis testing, gives precise criteria for rejecting or not rejecting a ___ within a confidence level.

Null hypothesis (often denoted H0)

The ___ is a statistic expressing the amount of random sampling error in the results of a survey. The larger the ___, the less confidence one should have that a poll result would reflect the result of a survey of the entire population. The ___ will be positive whenever a population is incompletely sampled and the outcome measure has positive variance, which is to say, the measure varies.

Margin of error

In statistics, a ___ is a variable that influences both the dependent variable and independent variable, causing a spurious association. ___ is a causal concept, and as such, cannot be described in terms of correlations or associations.

Confounder (also confounding variable, confounding factor, or lurking variable)

A ___ is a character that is set slightly below the normal line of type. ___ are often used to refer to members of a mathematical sequence or set or elements of a vector.

Subscripts

In probability theory and statistics, a ___ is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events (subsets of the sample space). A ___ is a mathematical description of the probabilities of events, subsets of the sample space. The sample space, often denoted by Omega, is the set of all possible outcomes of a random phenomenon being observed; it may be any set: a set of real numbers, a set of vectors, a set of arbitrary non-numerical values, etc. For example, the sample space of a coin flip would be Ω = {heads, tails}. To define ___ for the specific case of random variables (so the sample space can be seen as a numeric set), it is common to distinguish between discrete and continuous random variables.

Probability distribution

In probability theory and statistics, the ___ is either one of two discrete probability distributions: - The probability distribution of the number X of Bernoulli trials needed to get one success, supported on the set { 1, 2, 3, ... } - The probability distribution of the number Y = X − 1 of failures before the first success, supported on the set { 0, 1, 2, 3, ... } Which of these one calls "the" ___ is a matter of convention and convenience. The geometric distribution gives the probability that the first occurrence of success requires k independent trials, each with success probability p.

Geometric distribution

___ is a type of non-probability sampling that involves the sample being drawn from that part of the population that is close to hand. This type of sampling is most useful for pilot testing.

Convenience sampling (also known as grab sampling, accidental sampling, or opportunity sampling)

___ receive this name because, in an experiment, their values are studied under the supposition or hypothesis that they depend, by some law or rule (e.g., by a mathematical function), on the values of other variables.

Dependent variables

___ are not seen as depending on any other variable in the scope of the experiment in question; thus, even if the existing dependency is invertible (e.g., by finding the inverse function when it exists), the nomenclature is kept if the inverse dependency is not the object of study in the experiment. Explanatory variable is preferred by some authors over ___ when the quantities treated as ___ may not be statistically ___ or ___ manipulable by the researcher. If the ___ is referred to as an "explanatory variable" then the term "response variable" is preferred by some authors for the dependent variable.

Independent variables

___ is satisfied by making sure that np is greater than or equal to 10 and n(1-p) is greater than or equal to 10.

Large Counts Condition or 10% Condition

In probability theory, two random events A and B are ___ given a third event C precisely if the occurrence of A and the occurrence of B are independent events in their conditional probability distribution given C. In other words, A and B are ___ given C if and only if, given knowledge that C occurs, knowledge of whether A occurs provides no information on the likelihood of B occurring, and knowledge of whether B occurs provides no information on the likelihood of A occurring.

Conditionally independent

___ is the set of values of the test statistic for which we fail to reject the null hypothesis.

Region of acceptance

___ is the set of values of the test statistic for which the null hypothesis is rejected.

Region of rejection / Critical region

___ is a hypothesis associated with a contradiction to a theory one would like to prove.

Null hypothesis (H0)

___ is a hypothesis (often composite) associated with a theory one would like to prove.

Alternative hypothesis (H1)

In statistical testing, the ___ is the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct. (In the case of a composite null hypothesis, the largest such probability allowed under the null hypothesis is taken.) A very small ___ means that such an extreme observed outcome would be very unlikely under the null hypothesis.

p-value

___ is the upper bound imposed on the size of a test. Its value is chosen by the statistician prior to looking at the data or choosing any particular test to be used. It is the maximum exposure to erroneously rejecting H0 he/she is ready to accept. Testing H0 at ___ means testing H0 with a test whose size does not exceed ___. In most cases, one uses tests whose size is equal to the ___.

Significance level (α)

In probability and statistics, ___ is any member of a family of continuous probability distributions that arise when estimating the mean of a normally-distributed population in situations where the sample size is small and the population's standard deviation is unknown. It was developed by English statistician William Sealy Gosset under the pseudonym "Student".

Student's t-distribution (or simply the t-distribution)

In statistics, the number of ___ is the number of values in the final calculation of a statistic that are free to vary. The number of independent ways by which a dynamic system can move, without violating any constraint imposed on it, is called number of ___. In other words, the number of ___ can be defined as the minimum number of independent coordinates that can specify the position of the system completely.

Degrees of freedom

In statistical significance testing, a ___ is an alternative way of computing the statistical significance of a parameter inferred from a data set, in terms of a test statistic. A ___ is appropriate if the estimated value may depart from the reference value in only one direction, left or right, but not both. An example can be whether a machine produces more than one-percent defective products. In this situation, if the estimated value exists in one of the ___ critical areas, depending on the direction of interest (greater than or less than), the alternative hypothesis is accepted over the null hypothesis.

One-tailed or one-sided test

In statistical significance testing, a ___ is an alternative way of computing the statistical significance of a parameter inferred from a data set, in terms of a test statistic. A ___ is appropriate if the estimated value is greater or less than a certain range of values, for example, whether a test taker may score above or below a specific range of scores. This method is used for null hypothesis testing and if the estimated value exists in the critical areas, the alternative hypothesis is accepted over the null hypothesis.

Two-tailed or two-sided test

In statistical hypothesis testing, a result has ___ when it is very unlikely to have occurred given the null hypothesis. More precisely, a study's defined ___ level, denoted by a (alpha), is the probability of the study rejecting the null hypothesis, given that the null hypothesis was assumed to be true; and the p-value of a result, p, is the probability of obtaining a result at least as extreme, given that the null hypothesis is true. The result is ___, by the standards of the study, when p < a. The ___ level for a study is chosen before data collection, and is typically set to 5% or much lower—depending on the field of study.

Statistical significance

A ___ is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points are coded (color/shape/size), one additional variable can be displayed. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.

Scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram)

In statistics, ___ is a linear approach to modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple ___; for more than one, the process is called multiple ___. This term is distinct from multivariate ___, where multiple correlated dependent variables are predicted, rather than a single scalar variable. ___ models are often fitted using the least squares approach, but they may also be fitted in other ways, such as by minimizing the "lack of fit" in some other norm (as with least absolute deviations regression), or by minimizing a penalized version of the least squares cost function as in ridge regression (L2-norm penalty) and lasso (L1-norm penalty). Conversely, the least squares approach can be used to fit models that are not linear models. Thus, although the terms "least squares" and "linear model" are closely linked, they are not synonymous.

Linear regression

In statistical modeling, ___ is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome variable') and one or more independent variables (often called 'predictors', 'covariates', or 'features'). The most common form of ___ is linear regression, in which a researcher finds the line (or a more complex linear combination) that most closely fits the data according to a specific mathematical criterion.

Regression analysis

A ___ is a numerical measure of some type of correlation, meaning a statistical relationship between two variables. The variables may be two columns of a given data set of observations, often called a sample, or two components of a multivariate random variable with a known distribution. Several types of ___ exist, each with their own definition and own range of usability and characteristics. They all assume values in the range from −1 to +1, where ±1 indicates the strongest possible agreement and 0 the strongest possible disagreement. The Pearson product-moment ___, also known as r, R, or Pearson's r, is a measure of the strength and direction of the linear relationship between two variables that is defined as the covariance of the variables divided by the product of their standard deviations. This is the best-known and most commonly used type of ___.

Correlation coefficient

In statistics and optimization, ___ is a measure of the deviation of an observed value of an element of a statistical sample from its "theoretical value". The ___ of an observed value is the difference between the observed value and the estimated value of the quantity of interest (for example, a sample mean).

Residual (e)

In statistics, the ___ is the proportion of the variance in the dependent variable that is predictable from the independent variable(s). It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing of hypotheses, on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.

Coefficient of determination, denoted R^2 or r^2 and pronounced "R squared"

The ___ is a frequently used measure of the differences between values (sample or population values) predicted by a model or an estimator and the values observed. The ___ represents the square root of the second sample moment of the differences between predicted values and observed values or the quadratic mean of these differences. These deviations are called residuals when the calculations are performed over the data sample that was used for estimation and are called errors (or prediction errors) when computed out-of-sample. The ___ serves to aggregate the magnitudes of the errors in predictions for various data points into a single measure of predictive power. ___ is a measure of accuracy, to compare forecasting errors of different models for a particular dataset and not between datasets, as it is scale-dependent.

Root-mean-square deviation (RMSD) or root-mean-square error (RMSE)

A ___, is a statistical hypothesis test that is valid to perform when the test statistic is ___ distributed under the null hypothesis, specifically Pearson's ___ and variants thereof. Pearson's ___ is used to determine whether there is a statistically significant difference between the expected frequencies and the observed frequencies in one or more categories of a contingency table.

chi-squared test, also written as χ^2 test

___, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle. They are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other approaches. ___ are mainly used in three problem classes: optimization, numerical integration, and generating draws from a probability distribution.

Monte Carlo methods, Monte Carlo experiments, or Monte Carlo simulations

In statistics, the use of ___ is a Bayesian alternative to classical hypothesis testing. Bayesian model comparison is a method of model selection based on ___. The models under consideration are statistical models. The aim of the ___ is to quantify the support for a model over another, regardless of whether these models are correct. The technical definition of "support" in the context of Bayesian inference is described as a likelihood ratio of the marginal likelihood of two competing hypotheses, usually a null and an alternative. The posterior probability Pr(M|D) of a model M given data D is given by Bayes' theorem.

Bayes factor

In statistics, the ___ of a set of statistical data values is the arithmetic mean of the maximum and minimum values in a data set, defined as: M = (max x + min x) / 2. The ___ is the midpoint of the range; as such, it is a measure of central tendency. The ___ is rarely used in practical statistical analysis, as it lacks efficiency as an estimator for most distributions of interest, because it ignores all intermediate points, and lacks robustness, as outliers change it significantly.

Mid-range or mid-extreme

In statistics, a ___ is a list, table or graph that displays the frequency of various outcomes in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval.

Frequency distribution

A ___ is an approximate representation of the distribution of numerical data. It was first introduced by Karl Pearson. To construct a ___, the first step is to "bin" (or "bucket") the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval.

Histogram

A ___ is a device for presenting quantitative data in a graphical format, similar to a histogram, to assist in visualizing the shape of a distribution. Unlike histograms, ___ retain the original data to at least two significant digits, and put the data in order, thereby easing the move to order-based inference and non-parametric statistics.

Stem-and-leaf display or stem-and-leaf plot (also called a stemplot)

In mathematics, the term ___ describes something that pertains to squares, to the operation of squaring, to terms of the second degree, or equations or formulas that involve such terms. ___ is Latin for square.

Quadratic

Given random variables X,Y that are defined on a probability space, the ___ for X,Y is a probability distribution that gives the probability that each of X,Y falls in any particular range or discrete set of values specified for that variable. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution.

Joint probability distribution

In probability theory, ___ is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. If the event of interest is A and the event B is known or assumed to have occurred, "the ___ of A given B", or "the probability of A under the condition B", is usually written as P(A|B), or sometimes PB(A) or P(A/B). Conditional probabilities can be reversed using Bayes' theorem.

Conditional probability

___ provide a measure of the likelihood of a particular outcome. They are calculated as the ratio of the number of events that produce that outcome to the number that do not. ___ are commonly used in gambling and statistics. ___ can be demonstrated by examining rolling a six-sided die. The ___ of rolling a 6 is 1:5. This is because there is 1 event (rolling a 6) that produces the specified outcome of "rolling a 6," and 5 events that do not (rolling a 1,2,3,4 or 5). The ___ of rolling either a 5 or 6 is 2:4. This is because there are 2 events (rolling a 5 or 6) that produce the specified outcome of "rolling either a 5 or 6," and 4 events that do not (rolling a 1,2,3, or 4).

Odds

In statistics, ___ is a method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the ___. The logic of ___ is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

Maximum likelihood estimation (MLE)

In Bayesian statistical inference, a ___ of an uncertain quantity is the probability distribution that would express one's beliefs about this quantity before some evidence is taken into account. For example, the ___ could be the probability distribution representing the relative proportions of voters who will vote for a particular politician in a future election. The unknown quantity may be a parameter of the model or a latent variable rather than an observable variable. Bayes' theorem calculates the renormalized pointwise product of the ___ and the likelihood function, to produce the posterior probability distribution, which is the conditional distribution of the uncertain quantity given the data.

Prior probability distribution, often simply called the prior

In simple terms, ___ is the possibility of something bad happening. ___ involves uncertainty about the effects/implications of an activity with respect to something that humans value (such as health, well-being, wealth, property or the environment), often focusing on negative, undesirable consequences. Many different definitions have been proposed. The international standard definition of ___ for common understanding in different applications is “effect of uncertainty on objectives”.

Risk

___ refers to epistemic situations involving imperfect or unknown information. It applies to predictions of future events, to physical measurements that are already made, or to the unknown. ___ arises in partially observable and/or stochastic environments, as well as due to ignorance, indolence, or both.

Uncertainty

If you are forecasting for an observation that was part of the data sample - it is ___ forecast. If you are forecasting for an observation that was not part of the data sample - it is ___ forecast.

In-sample & Out-of-sample

___ is a process that occurs in a feedback loop which exacerbates the effects of a small disturbance. That is, the effects of a perturbation on a system include an increase in the magnitude of the perturbation. That is, A produces more of B which in turn produces more of A.

Positive feedback (exacerbating feedback, self-reinforcing feedback)

___ occurs when some function of the output of a system, process, or mechanism is fed back in a manner that tends to reduce the fluctuations in the output, whether caused by changes in the input or by other disturbances.

Negative feedback (or balancing feedback)

A ___ is a definitive and specific statement about when and where an earthquake will strike: a major earthquake will hit Kyoto, Japan, on June 28.

Prediction

A ___ is a probabilistic statement, usually over a longer time scale: there is a 60 percent chance of an earthquake in Southern California over the next thirty years.

Forecast

In physics, the ___ is the disturbance of an observed system by the act of observation. This is often the result of instruments that, by necessity, alter the state of what they measure in some manner. A common example is checking the pressure in an automobile tire; this is difficult to do without letting out some of the air, thus changing the pressure. Similarly, it is not possible to see any object without light hitting the object, and causing it to reflect that light.

Observer effect

In mathematics, a ___ is the rapid growth of the complexity of a problem due to how the combinatorics of the problem is affected by the input, constraints, and bounds of the problem. ___ is sometimes used to justify the intractability of certain problems. Examples of such problems include certain mathematical functions, the analysis of some puzzles and games, and some pathological examples which can be modelled as the Ackermann function.

Combinatorial explosion

A ___ (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information, while ___ (in the mass noun sense) is the process of using and analysing those statistics. ___ is distinguished from inferential statistics (or inductive statistics) by its aim to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent.

Descriptive statistic

The ___, of a data set is the average of the absolute deviations from a central point. It is a summary statistic of statistical dispersion or variability. In the general form, the central point can be a mean, median, mode, or the result of any other measure of central tendency or any random data point related to the given data set. The absolute values of the differences between the data points and their central tendency are totaled and divided by the number of data points.

Mean absolute deviation (MAD)

In statistics, a ___ is a score below which a given percentage of scores in its frequency distribution fall (exclusive definition) or a score at or below which a given percentage fall (inclusive definition). For example, the 50th ___ (the median) is the score below which 50% (exclusive) or at or below which (inclusive) 50% of the scores in the distribution may be found.

Percentile (or a centile)

In probability theory, a ___ of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample. In other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0 (since there is an infinite set of possible values to begin with), the value of the ___ at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would equal one sample compared to the other sample. In a more precise sense, the ___ is used to specify the probability of the random variable falling within a particular range of values, as opposed to taking on any one value

Probability density function (PDF), or density

The ___ is a set of descriptive statistics that provides information about a dataset. It consists of the five most important sample percentiles: - the sample minimum (smallest observation) - the lower quartile or first quartile - the median (the middle value) - the upper quartile or third quartile - the sample maximum (largest observation)

Five-number summary

In mathematics, a ___, is a function between the elements of two sets, where each element of one set is paired with exactly one element of the other set, and each element of the other set is paired with exactly one element of the first set. There are no unpaired elements.

Bijection, bijective function, one-to-one correspondence, or invertible function

A ___ is a very small line chart, typically drawn without axes or coordinates. It presents the general shape of the variation (typically over time) in some measurement, such as temperature or stock market price, in a simple and highly condensed way. ___ are small enough to be embedded in text, or several ___ may be grouped together as elements of a small multiple.

Sparkline

___, as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values. Formally, the ___ problem is defined as follows: given a set S of points in a space M and a query point q ∈ M, find the closest point in S to q.

Nearest neighbor search (NNS)

In statistics, ___ is a type of linear least squares method for estimating the unknown parameters in a linear regression model. ___ chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable (values of the variable being observed) in the given dataset and those predicted by the linear function of the independent variable.

Ordinary least squares (OLS)

In mathematics, ___ is a type of estimation, beyond the original observation range, of the value of a variable on the basis of its relationship with another variable. It is similar to interpolation, which produces estimates between known observations, but ___ is subject to greater uncertainty and a higher risk of producing meaningless results. ___ may also mean extension of a method, assuming similar methods will be applicable. ___ may also apply to human experience to project, extend, or expand known experience into an area not known or previously experienced so as to arrive at a (usually conjectural) knowledge of the unknown (e.g. a driver ___ road conditions beyond his sight while driving).

Extrapolation

A ___ is the sociopsychological phenomenon of someone "predicting" or expecting something, and this "prediction" or expectation coming true simply because the person believes it will and the person's resulting behaviors align to fulfill the belief. This suggests that people's beliefs influence their actions. The principle behind this phenomenon is that people create consequences regarding people or events, based on previous knowledge of the subject.

Self-fulfilling prophecy/prediction

A ___ is a prediction that prevents what it predicts from happening. A ___ can be the result of rebellion to the prediction. If the audience of a prediction has an interest in seeing it falsified, and its fulfillment depends on their actions or inaction, their actions upon hearing it will make the prediction less plausible. If a prediction is made with this outcome specifically in mind, it is commonly referred to as reverse psychology or warning. Also, when working to make a premonition come true, one can inadvertently change the circumstances so much that the prophecy cannot come true.

Self-defeating prophecy/prediction (self-destroying or self-denying in some sources), also known as the prophet's dilemma.

The simplest case of a normal distribution is known as the ___. This is a special case when the mean (mu) is 0 and the standard deviation (sigma) is 1.

Standard normal distribution

A ___ is a mathematical table for the values of Φ (phi), which are the values of the cumulative distribution function of the normal distribution. It is used to find the probability that a statistic is observed below, above, or between values on the standard normal distribution, and by extension, any normal distribution. Since probability tables cannot be printed for every normal distribution, as there are an infinite variety of normal distributions, it is common practice to convert a normal to a standard normal and then use the ___ to find probabilities.

Standard normal table, also called the unit normal table or Z table

In probability theory and statistics, the ___ of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. In the case of a scalar continuous distribution, it gives the area under the probability density function from minus infinity to x. ___ are also used to specify the distribution of multivariate random variables.

Cumulative distribution function (CDF)

In mathematics, ___ is an approximation for factorials. It is a good approximation, leading to accurate results even for small values of n. It is named after James ___, though it was first stated by Abraham de Moivre.

Stirling's approximation (or Stirling's formula)

The method of ___ is a standard approach in regression analysis to approximate the solution of overdetermined systems (sets of equations in which there are more equations than unknowns) by minimizing the sum of the squares of the residuals made in the results of every single equation. The most important application is in data fitting. The best fit in the ___ sense minimizes the sum of squared residuals (a residual being: the difference between an observed value, and the fitted value provided by a model).

Least squares

In mathematics, the ___ of a line is a number that describes both the direction and the steepness of the line. ___ is often denoted by the letter m; there is no clear answer to the question why the letter m is used for ___, but its earliest use in English appears in O'Brien (1844) who wrote the equation of a straight line as "y = mx + b" and it can also be found in Todhunter (1888) who wrote it as "y = mx + c". ___ is calculated by finding the ratio of the "vertical change" to the "horizontal change" between (any) two distinct points on a line. Sometimes the ratio is expressed as a quotient ("rise over run"), giving the same number for every two distinct points on the same line.

Slope or gradient

In analytic geometry, using the common convention that the horizontal axis represents a variable x and the vertical axis represents a variable y, a ___ is a point where the graph of a function or relation intersects the y-axis of the coordinate system. As such, these points satisfy x = 0.

y-intercept or vertical intercept

A ___ is a graph that shows the residuals on the vertical axis and the independent variable on the horizontal axis. If the points in a ___ are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a nonlinear model is more appropriate.

Residual plot

In statistics, a ___ is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. They are heavily used in survey research, business intelligence, engineering, and scientific research. They provide a basic picture of the interrelation between two variables and can help find interactions between them.

Contingency table (also known as a cross tabulation or crosstab)

___ have been used in statistics for tasks such as selected random samples. This was much more effective than manually selecting the random samples (with dice, cards, etc.). Nowadays, ___ have been replaced by computational random number generators.

Random number tables or random digit tables

An ___ is a class of computational models for simulating the actions and interactions of autonomous agents (both individual or collective entities such as organizations or groups) with a view to assessing their effects on the system as a whole. It combines elements of game theory, complex systems, emergence, computational sociology, multi-agent systems, and evolutionary programming. Monte Carlo methods are used to introduce randomness.

Agent-based model (ABM)

In statistics, ___ is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known. Examples are assigning a given email to the "spam" or "non-spam" class, and assigning a diagnosis to a given patient based on observed characteristics of the patient (sex, blood pressure, presence or absence of certain symptoms, etc.). ___ is an example of pattern recognition.

Classification

In statistics, a sequence (or a vector) of random variables is ___ if all its random variables have the same finite variance.

Homoscedastic or homoskedastic from Ancient Greek homo "same" and skedasis "dispersion". Also known as homogeneity of variance.

In statistics, a vector of random variables is ___ if the variability of the random disturbance is different across elements of the vector. Here, variability could be quantified by the variance or any other measure of statistical dispersion.

Heteroscedastic or heteroskedastic from Ancient Greek hetero "different" and skedasis "dispersion"

In probability theory and statistics, the ___ is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The ___ is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

Multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution

In statistics, ___ is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. In this situation, the coefficient estimates of the multiple regression may change erratically in response to small changes in the model or the data. ___ does not reduce the predictive power or reliability of the model as a whole, at least within the sample data set; it only affects calculations regarding individual predictors. That is, a multivariate regression model with collinear predictors can indicate how well the entire bundle of predictors predicts the outcome variable, but it may not give valid results about any individual predictor, or about which predictors are redundant with respect to others.

Multicollinearity (also collinearity)

In mathematics and physics, a ___ is an element of a ___ space. Historically, ___ were introduced in geometry and physics (typically in mechanics) before the formalization of the concept of ___ space. Therefore, one often talks about ___ without specifying the ___ space to which they belong. Specifically, in a Euclidean space, one considers spatial ___, also called Euclidean ___ which are used to represent quantities that have both magnitude and direction, and may be added, subtracted and scaled (i.e. multiplied by a real number) for forming a ___ space.

Vector

The ___ can be expressed as follows: "What is the probability that the sun will rise tomorrow?" The ___ illustrates the difficulty of using probability theory when evaluating the plausibility of statements or beliefs. According to the Bayesian interpretation of probability, probability theory can be used to evaluate the plausibility of the statement, "The sun will rise tomorrow." We just need a hypothetical random process that determines whether the sun will rise tomorrow or not. Based on past observations, we can infer the parameters of this random process, and from there evaluate the probability that the sun will rise tomorrow.

Sunrise problem

In Bayesian statistics, the ___ of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence or background is taken into account. "___", in this context, means after taking into account the relevant evidence related to the particular case being examined. The ___ distribution is the probability distribution of an unknown quantity, treated as a random variable, conditional on the evidence obtained from an experiment or survey.

Posterior probability

The ___ is a brain teaser, in the form of a probability puzzle, loosely based on the American television game show Let's Make a Deal and named after its original host, ___. The problem was originally posed (and solved) in a letter by Steve Selvin to the American Statistician in 1975. It became famous as a question from reader Craig F. Whitaker's letter quoted in Marilyn vos Savant's "Ask Marilyn" column in Parade magazine in 1990 Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice? Vos Savant's response was that the contestant should switch to the other door. Under the standard assumptions, contestants who switch have a 2/3 chance of winning the car, while contestants who stick to their initial choice have only a 1/3 chance.

Monty Hall problem

In mathematics, a ___ is a plane curve which is mirror-symmetrical and is approximately U-shaped. It fits several superficially different mathematical descriptions, which can all be proved to define exactly the same curves. One description of a ___ involves a point (the focus) and a line (the directrix). The focus does not lie on the directrix. The ___ is the locus of points in that plane that are equidistant from both the directrix and the focus. Another description of a ___ is as a conic section, created from the intersection of a right circular conical surface and a plane parallel to another plane that is tangential to the conical surface.

Parabola. Parabolic usually refers to something in a shape of a parabola.

___ is a modified version of R-squared that has been adjusted for the number of predictors in the model. The ___ increases when the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected. Typically, the ___ is positive, not negative. It is always lower than the R-squared.

Adjusted R-squared

In mathematics, a set A is a ___ of a set B if all elements of A are also elements of B. It is possible for A and B to be equal; if they are unequal, then A is a proper or strict ___ of B.

Subset, may also be expressed as B includes (or contains) A or A is included (or contained) in B.

In mathematics, a set B is a ___ of a set A if all elements of A are also elements of B. It is possible for A and B to be equal; if they are unequal, then B is a proper or strict ___ of A.

Superset, may also be expressed as B includes (or contains) A or A is included (or contained) in B.

The ___, is denoted by the symbol "∈". Writing x ∈ A means that "x ___ A"

Relation "is an element of", also called set membership

___ is a typeface style that is often used for certain symbols in mathematical texts, in which certain lines of the symbol (usually vertical or near-vertical lines) are doubled. The symbols usually denote number sets. For example, "E" represents the expected value of a random variable, "Q" represents the set of rational numbers. "R" represents the set of real numbers, "V" represents a vector space and "Z" represents the set of integers. (The Z is for Zahlen, German for "numbers", and zählen, German for "to count".)

Blackboard bold

The ___, is the erroneous belief that if a particular event occurs more frequently than normal during the past it is less likely to happen in the future (or vice versa), when it has otherwise been established that the probability of such events does not depend on what has happened in the past. Such events, having the quality of historical independence, are referred to as statistically independent. The fallacy is commonly associated with gambling, where it may be believed, for example, that the next dice roll is more than usually likely to be six because there have recently been fewer than the usual number of sixes. Perhaps the most famous example of the ___ occurred in a game of roulette at the Monte Carlo Casino on August 18, 1913, when the ball fell in black 26 times in a row. This was an extremely uncommon occurrence: the probability of a sequence of either red or black occurring 26 times in a row is around 1 in 66.6 million, assuming the mechanism is unbiased. Gamblers lost millions of francs betting against black, reasoning incorrectly that the streak was causing an imbalance in the randomness of the wheel, and that it had to be followed by a long streak of red.

Gambler's fallacy, also known as the Monte Carlo fallacy or the fallacy of the maturity of chances

A ___ is an event with a single outcome (only one "answer"). Where probability of event is number of times it can occur over number of total possible outcomes. In a ___, the numerator ("number of times it can occur") will be 1.

Simple (or single) event

A ___ is the combination of two or more simple events (with two or more outcomes). Where probability of event is number of times it can occur over number of total possible outcomes. In a ___, the numerator ("number of times it can occur") will be greater than 1.

Compound event

A ___ is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.

Decision tree

In mathematics, an ___, is the result of multiplying no factors. It is by convention equal to 1, or the multiplicative identity (assuming there is an identity for the multiplication operation in question), just as the empty sum—the result of adding no numbers—is by convention 0, or the additive identity.

Empty product, or nullary product or vacuous product

Formally, a ___ is a random variable whose cumulative distribution function is continuous everywhere. There are no "gaps", which would correspond to numbers which have a finite probability of occurring. Instead, ___ almost never take an exact prescribed value but there is a positive probability that its value will lie in particular intervals which can be arbitrarily small.

Continuous random variable

The ___ states that sample sizes should be no more than ___ of the population.

10% condition

In mathematics, the ___ are those used for counting (as in "there are six coins on the table") and ordering (as in "this is the third largest city in the country"). In common mathematical terminology, words colloquially used for counting are "cardinal numbers", and words used for ordering are "ordinal numbers". The ___ can, at times, appear as a convenient set of codes (labels or "names"), that is, as what linguists call nominal numbers, forgoing many or all of the properties of being a number in a mathematical sense. Some definitions, including the standard ISO 80000-2, begin the ___ with 0, corresponding to the non-negative integers 0, 1, 2, 3, whereas others start with 1, corresponding to the positive integers 1, 2, 3.

Natural numbers

In probability theory and statistics, the ___ is a family of continuous probability distributions defined on the interval [0, 1] parameterized by two positive shape parameters, denoted by α and β, that appear as exponents of the random variable and control the shape of the distribution. The generalization to multiple variables is called a Dirichlet distribution. The ___ has been applied to model the behavior of random variables limited to intervals of finite length in a wide variety of disciplines. In Bayesian inference, the ___ is the conjugate prior probability distribution for the Bernoulli, binomial, negative binomial and geometric distributions. The ___ is a suitable model for the random behavior of percentages and proportions.

Beta distribution

Statistics & Probability Flashcards

(200 cards)