Introduction Flashcards
Data Analysis Cycle
1) Acquisition
2) Cleaning - Verification (reasonable?), Manipulation (transforming, etc.)
3) Organization - dfs, data bases, XML
4) Analysis
5) Simulation - extension of analysis with simulations
6) Reporting
We’ll focus on 1-3
A statistic is often just…
a function of a random sample, for example the sample mean, the 95th quantile, or the sample proportion.
Statistics are often used as…
estimators of quantities of interest about the distribution, called parameters.
Statistics are _____ parameters are _____
random variables (since they depend on the sample); not.
The main idea in a simulation study is…
to replace the mathematical expression for the distribution with a sample from that distribution.
Steps in carrying out a simulation study:
- Specify what makes up an individual experiment: sample size, distributions, parameters, statistic of interest.
- Write an expression or function to carry out an individual experiment and return the statistic.
- Determine what inputs, if any, to vary (e.g., different sample sizes or parameters).
- For each combination of inputs, repeat the experiment B times, providing B samples of the statistic.
- For each combination of inputs, summarize the empirical distribution of the statistic of interest.
- State and/or plot the results. (Sometimes go back to 3.)
What does sample() do?
Generates random numbers from a specific population specified, and probabilities specified!
What are some useful random number generators?
sample() (pulls from whatever is specified) runif() (random uniform number generator) rnorm() rbinom() rexp() rpois() rt() rf()
What is a simple congruential generator?
Uses modular arithmetic to generate “random” numbers
x_1 = a * x_0 mod b
And the others are calculated recursively!
x(n+1) = a * x_n mod b
x_0 is the seed!
For homework, do set.seed() at…
the top of the homework document