Intro Statistics Flashcards
central limit theorem
If x_bar is the mean of a random sample X1, X2, … , Xn of size n from a distribution with a finite mean mu and a finite positive variance sigma ², then the distribution of W = (x_bar -mu)/ (sigma/sqrt(n)) is N(0,1) in the limit as n approaches infinity.
This means that the variable is distributed N(mu,sigma/sqrt(n)).
binomial distribution
with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question
P(x=k) = (n,k) * p^k * (1 - p)^(n-k)
(n,k) = n! / (k! (n - k)!)
Mu = n*p Sigma = n*p*(1-p)
Accuracy
the proportion of true results (both true positives and true negatives) among the total number of cases examined.[
accuracy = tp + tn / (tp + tn + fp + fn)
Precision
precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances
precision = tp / (tp + fp)
Recall
recall (also known as sensitivity) is the fraction of relevant instances that have been retrieved over the total amount of relevant instances
recall = tp / (tp + fn)
type I error
a type I error is the rejection of a true null hypothesis (also known as a “false positive” finding)
a type I error is to falsely infer the existence of something that is not there
type II error
type II error is retaining a false null hypothesis (also known as a “false negative” finding)
a type II error is to falsely infer the absence of something that is
kullback liebler divergence
a measure of how one probability distribution diverges from a second, expected probability distribution
kolmogrov smirnoff test
is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (one-sample K–S test), or to compare two samples (two-sample K–S test)
Bootstrap
statistical method for estimating the sampling distribution of an estimator by sampling with replacement from the original sample, most often with the purpose of deriving robust estimates of standard errors and confidence intervals of a population parameter like a mean, median, proportion, odds ratio, correlation coefficient or regression coefficient.
Jackknife
The jackknife estimator of a parameter is found by systematically leaving out each observation from a dataset and calculating the estimate and then finding the average of these calculations.
Permutation test
the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points. In other words, the method by which treatme
Two tailed test
appropriate if the estimated value may be more than or less than the reference value, for example, whether a test taker may score above or below the historical average
One tailed test
appropriate if the estimated value may depart from the reference value in only one direction, for example, whether a machine produces more than one-percent defective products
Assessing normality
Subtract mean divide by variance, compare to standard normal values — nscore
Box plot
Categorical variables, shows shape of distribution, central value and variability
Median black center line
Box top bottom are first and third quartiles
Vertical lines 1.5 times IQR
Outside lines points shown
IQR
Inter quartile range
Distance between first and third quartiles
Two way table
two-way table presents categorical data by counting the number of observations that fall into
Correlation coefficient
R = 1/(n-1) * Sum( ((x-x_mean)/std_x ) * ((y-y_mean)/std_y)) )
ANOVA
Analysis of variance is a statistical method used to test differences between two or more means of variance
Parameter
parameter is a number describing a population, such as a percentage or proportion.
true proportion of defective items in the entire population
Statistic
is a number which may be computed from the data observed in a random sample without requiring the use of any unknown parameters, such as a sample mean.
takes a sample 300 items and observes that 15 of these are defective- computes the statistic , p_hat = 15/300 = 0.05 an estimate of the parameter p
Biased estimator
statistic is systematically skewed away from the true parameter p, it is considered to be a biased estimator of the parameter
Unbiased estimator
unbiased estimator will have a sampling distribution whose mean is equal to the true value of the parameter.