Lecture 7 Flashcards
What are Computer Intensive Techniques (resampling/CIT)?
use of computers to compute thousands of new samples, divergent statistics or other values of interest to analyze and to validate models.
What is a Parametric assumption?
based on assumptions regarding the population and the parameter of interest. A normal distribution is expected. If the expectations are not achieved, we switch to non-parametric statistics.
What is a non-parametric assumption?
tests make no strong assumptions about the population, sometimes called “distribution free”.
What are the 3 types of CIT?
- Bootstrap sampling distributions and confidence intervals.
- Monte Carlo methods.
- Randomization tests, permutation tests, exact tests.
Are CIT approaches essentially parametric or non-parametric?
Non-parametric.
What is Bootstrapping?
It is “resampling with replacement” from one sample with size n.
What can we do after Bootstrapping?
Compute any statistic (mean, median, standard dev. etc.) and store this value. This process is repeated 1000 times.
What can we do after computing the statistics (Bootstrapping)?
With the results, a distribution is constructed, and this is compared to the initial statistic drawn from the original sample. Statistical signigicance of the original is being established.
What are 4 advantages of Bootstrapping?
- Simple test, especially with a small sample size.
- It allows simulating the sampling distribution of many different statistics.
- Useful when the variables are not normally distributed.
- Useful when the relationship between variables is not linear.
What are 2 disadvantages of Bootstrapping?
- The new sampling distribution of the mean is NOT equal to the real sampling distribution of the mean.
- The sample must be representative for the population.
What is the Monte Carlo method?
Method to calculate the area of (usually) more complex areas.
How does the Monte Carlo method work?
Randomly throw 1000 arrows in a known area positioned around the area you want to calculate. The chance of being “hit” is the same for every spot. if 50% of the arrows hit the area you want to calculate, you know that the surface area is 50% of the known area.
What are 4 advantages/characteristics of the Monte Carlo method?
- It can solve problems that have a probabilistic interpretation.
- When the data set does not meet the requirements for parametric or asymptotic methods, MC can be used.
- It involves repeatedly sampling in order to obtain a good estimate or approximation of the exact p-value.
- Computing an exact p-value is possible via exact tests and randomization tests, but only for small data sets.
What is the order for statistical testing (MC method)? (7 steps)
- State null hypothesis and alternative hypothesis.
- State significance level: alpha.
- Compute degrees of freedom.
- Compute decision rule, critical region, critical values.
- Compute test statistic and confidence interval.
- State the results.
- State the conclusion.
What is given in asymptotic methods?
A sufficient sample size.
What is the one-sample T-test formula?
t = (Sample mean - Proposed constant for the population mean) / (Sample standard deviation / squared root of sample size)
What is the Z-test formula?
Z = (Sample mean - population mean) / (population standard deviation / squared root of sample size)