Topic 4: Normal Model Flashcards
What is normal model?
A model used for data that is fairly symmetric and used to capture the general trend.
Why do we use normal model?
- Normal model represents many natural phenomenon
- Models data caused by different factors (Central Limit Theorem)
- Defined by mean and SD (easy to use)
How many types of normal models are there? Describe them
- Standard normal model: N(0,1)
- General normal model: N(mean, SD^2)
When is normal model used?
A histogram that has quite a bell shape.
Diagnostics:
- Does the histogram look normal? (no long tails or many outliers)
- Do the proportions look right? (or does it fit into the 68/95/99.7% rule?)
- Does the quantile-quantile (QQ) plot look linear?
- Does the shapiro test have low value? (<0.05: not fit)
What commands are used to determine the area under the curve or the threshold of that area?
pnorm(x) (cumulative distribution function): used to calculate the area(%) of the area under the curve (lower tail)
pnorm(x); pnorm(x, lower.tail=F); pnorm(x,mean,SD)
x is the threshold
qnorm(y) (quantile function): determine the threshold of the area
qnorm(y,mean,SD)
y is the area/the percentile
How to rescale general normal model to standard normal model?
Calculate the standard units (Z= (data points - mean)/SD)
How is individual measurement different from exact value?
Individual measurement = exact value + chance error + bias
How can we estimate chance error?
Chance error happens because measured value turns out differently every time.
Chance error can be estimated by replicating the measurement to calculate the SD.
What is bias?
Bias is a systamic error that is in a constant amount added or subtracted from the measurement.
Bias cannot be estimated by replicating the measurement.
What does the area under the normal curve approximate?
The area under the normal curve approximates that section of the histogram.
What is reproducible research and why is it important?
Reproducible research is a research that makes data sets and software used available for verifying published findings and alternative analyse.
Without reproducible research, data versions and graphical summaries can change.