Statistics Flashcards
What is the Central Limit Theorem? How is it useful?
The CLM states that given a population with a known mean and variance, sampling from a population, the higher the number of samples, the closer the distribution of the sample means approaches a normal distribution with the same mean as the population and a standard deviation of the population sd / sqr(n).
What is sampling? How many sampling methods do you know?
Data sampling is a statistical analysis technique used to select, manipulate and analyze a representative subset of data points to identify patterns and trends in the larger data set being examined.
What is the difference between Type I and Type II error?
Type I error occurs when the null hypothesis is true and we reject. The probability of a type I error is chosen ahead of time - alpha.
Type II occurs when we don’t reject a false hypothesis.
1 - P(Type II) is called the power of a test.
The lower the probability of 1 we accept a higher probability of the other.
What is linear regression? What do the terms p-value, coefficient and R-Squared value mean?
A linear regression is a simple model for both predictive and inference analytics. It hopes to explain some response/dependent variable as a linear combination of features/independent variables. The coefficients the model arrives on, using OLS, minimize the sum errors in in models predictions and the observed values. Each coefficient can be interpreted as the change in Y given a one unit change in the associated variable. The p-values are how confident we are in the estimation of each coefficient. The R-Squared value is the amount of variation in Y that is explained by the model.
What are the assumptions for linear regression?
The four major assumptions are:
- linearity in the regressors
- no multi-collinearity between the regressors,
- independence between the regressors, and
- homoscedasticity - the variance in each regressor is not related to the dependent variable
What is a statistical interaction?
An interaction is when the effect of one factor on the dependent variable differs among levels of another factor.
What is selection bias?
Selection bias occurs when a subset of the data are systematically excluded from analysis
What is the binomial probability formula?
The binomial probability provides a distribution of the number of successes in n-trials for independent events that have a probability of theta.
What is an exact test?
“In statistics, an exact (significance) test is a test where all assumptions, upon which the derivation of the distribution of the test statistic is based, are met as opposed to an approximate test (in which the approximation may be made as close as desired by making the sample size big enough). This will result in a significance test that will have a false rejection rate always equal to the significance level of the test. For example an exact test at significance level 5% will in the long run reject true null hypotheses exactly 5% of the time.”