Practical Statistics Flashcards

Question

Lambda

Answer 1

The rate (per unit of time or space) at which events occur

Answer 2

The frequency distribution of the number of events in sampled units of time or space

Answer 3

The frequency distribution of the time or distance from one event to the next event

Answer 4

A generalized version of the exponential distribution in which the event rate is allowed to shift over time

Answer 5

Something (drug, price, web headline) to which a subject is exposed

Answer 6

A group of subjects exposed to a specific treatment

Answer 7

A group of subjects exposed to no (or standard) treatment

Answer 8

The items (web visitors, patients, etc) that are exposed to treatments

Answer 9

The metric used to measure the effect of the treatment

Answer 10

The hypothesis that chance is to blame

Answer 11

Counterpoint to the null (what you hope to prove)

Answer 12

Hypothesis test that counts chance results only in one direction (e.g. B is better than A)

Answer 13

Hypothesis test that counts chance results in two directions (e.g. A is different from B; could be bigger or smaller)

Answer 14

The procedure of combining two or more samples together and randomly (or exhaustively) reallocating the observations to resamples Randomization test, rand permutation test, exact test

Answer 15

Drawing additional examples ("resamples") from an observed data set

Answer 16

Given a chance model that embodies the null hypothesis, the p-value is the probability of obtaining results as unusual or extreme as the observed results not "What is the probability that this happened by chance?"

Answer 17

The probability threshold of "unusualness" that chance results must surpass for actual outcomes to be deemed statistically significant typically 5% and 1%

Answer 18

Mistakenly concluding an effect is real (when it is due to chance)

Answer 19

Mistakenly concluding an effect is due to chance (when it is real)

Answer 20

An imaginary slot machine with multiple arms for the customer to choose from, each with different payoffs, here taken to be an analogy for a multitreatment experiment Alters traditional sampling process to incorporate information learned during the experiment and reduce the frequency of the inferior treatment epsilon-greedy

Answer 21

A treatment in an experiment (e.g. "headline A in a web test")

Answer 22

The experimental analog of a win at the slot machine (e.g. "customer clicks on the link")

Answer 23

The minimum size of the effect that you hope to be able to detect in a statistical test, such as "a 20% improvement to click rates" Bigger the effect size, the fewer samples you probably need to detect it

Answer 24

The probability of detecting a given effect size with a given sample size

Answer 25

The statistical significance level at which the test will be conducted alpha

Answer 26

The variable we are trying to predict dependent variable, Y variable, target, outcome

Answer 27

The variable used to predict the response X variable, feature, attribute, predictor

Answer 28

The vector of predictor and outcome values for a specific individual or case row, case, instance, example

Answer 29

The intercept of the regression line -- that is, the predicted value when X = 0 b_0, B_0

Answer 30

The slope of the regression line slope, b_1, B_1, parameter estimates, weights

Answer 31

The estimates Y_hat_i obtained from the regression lines predicted values

Answer 32

The difference between the observed values and the fitted values errors

Answer 33

The method of fitting a regression by minimizing the sum of squared residuals ordinary least squares, OLS

Answer 34

The square root of the average squared error of the regression (this is the most widely used metric to compare regression models) RMSE

Answer 35

The same as the root mean squared error, but adjusted for degrees of freedom RSE

Answer 36

The proportion of variance explained by the model, from 0 to 1 coefficient of determination, R^2

Answer 37

The coefficient for a predictor, divided by the standard error of the coefficient, giving a metric to compare the importance of variables in the model

Answer 38

Regression with the records having different weights

Answer 39

When the predictor variables are highly correlated, it is difficult to interpret the individual coefficients

Answer 40

When the predictor variables have perfect, or near-perfect, correlation, the regression can be unstable or impossible to compute collinearity

Answer 41

An important predictor that, when omitted, leads to spurious relationships in a regression equation

Answer 42

The relationship between a predictor and the outcome variable, independent of other variables

Answer 43

An interdependent relationship between two or more predictors and the response

Answer 44

The probability of observing some event (say, X = i) given some other event (say, Y = i), written as P(X_i | Y_i)

Answer 45

The probability of an outcome after the predictor information has been incorporated (in contract to the prior probability of outcomes, not taking predictor information into account)

Answer 46

A measure of the extent to which one variable varies in concert with another (ie similar magnitude and direction)

Answer 47

The function that, when applied to the predictor variables, maximizes the separation of the classes Fisher's Linear Discriminant maximizes the "between" sum of squares relative to the "within" sum of squares

Answer 48

The scores that result from the application of the discriminant function and are used to estimate probabilities of belonging to one class or another

Answer 49

The function that maps class membership probability to a range from negative to positive infinity log odds

Answer 50

The ratio of "success" (1) to "not success" (0) The probability of an event divided by the probability that the event will not occur

Answer 51

The response in the transformed model (now linear), which gets mapped back to a probability

Answer 52

Analogous to multiple linear regression but the outcome is binary. Is a special instance of a "generalized linear model" (GLM). Fit with Maximum Likelihood Estimation

Answer 53

A process that tries to find the model that is most likely to have produced the data we see. Involves quasi-Newton optimization that iterates between a scoring step, based on the current parameters, and an update to the parameters to improve the fit

Answer 54

tp / (tp + fn) Sensitivity, TPR, hit-rate

Answer 55

tp / (tp + fp)

Answer 56

tn / (tn + fp) True negative rate

Answer 57

Harmonic mean of the precision and recall 2 * Recall * Precision / (Recall + Precision)

Answer 58

The plot of the true positive rate (TPR, recall, y-axis) against the false positive rate (FPR, x-axis), at various threshold settings Some definitions use specificity (TNR) for the x-axis

Answer 59

Error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting) More tunable parameters -> lower bias -> higher variance

Answer 60

Error from sensitivity to small fluctuations in the training set. High variance may result from an algorithm modeling the random noise in the training data (overfitting) More tunable parameters -> lower bias -> higher variance

Answer 61

convex: one minimum - important: an optimization algorithm(like gradient descent) wont get stuck in a local minimum non-convex: some up and down valleys (local minimas) that aren’t as down as the overall down (global minum) -optimization algorithms can get stuck in local minimum and it can be hard to tell when this happens

Answer 62

A measure of how one probability distribution diverges from a second, expected probability distribution KL-divergence

Answer 63

A nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test)

Answer 64

Analysis of variation is a statistical method used to test differences between two or more means of variance

Answer 65

Principle Component Analysis - orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. - transformation is defined in such a way that the first principal component has the largest possible variance and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components - resulting vectors are an uncorrelated orthogonal basis set. - PCA is sensitive to the relative scaling of the original variables.

Answer 66

1. p-values can indicate how incompatible the data are with a specified statistical model 2. p-values do no measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone 3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold 4. Proper inference requires full reporting and transparency 5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result 6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis

Practical Statistics Flashcards

(90 cards)