Week 2: Normal Distribution, Inference, and Confidence Intervals Flashcards
How do you create a histogram with a normal curve?
Use <histogram varname, frequency normal> to overlay a normal curve on the histogram.
What is skewness?
Skewness measures the symmetry of the distribution.
- Negative skew: Long tail to the left
- Positive skew: Long tail to the right
- Normal distribution: Skewness = 0
What is kurtosis, and how is it interpreted?
- Kurtosis measures the “peakedness” of a distribution.
- Kurtosis = 3: Normal (mesokurtic)
- Kurtosis > 3: Leptokurtic (peaked with heavy tails)
- Kurtosis < 3: Platykurtic (flat with light tails)
The “excess” kurtosis is used in statistics to compare the kurtosis coefficient with that of a normal distribution.
Note: Kurtosis is often not reported in publications.
What command shows detailed statistics, including skewness and kurtosis?
Use <sum varname, detail>
What is the purpose of transforming variables?
To make non-normal data more normally distributed, enabling the use of parametric tests.
How do you create a logarithmic transformation?
Use <generate newvar=log(varname)>
Note: You can also use the command <ln> instead of <log>. You also cannot calculate the log of a negative number (or zero). These numbers, if present, will be set to system-missing in the transformed variable.</log></ln>
What are Z-scores, and why are they useful?
Z-scores standardise data, converting it to a distribution with a mean of 0 and SD of 1, useful for comparison across datasets.
How do you create Z-scores?
Use <egen zvarname = std(varname)>
These are more useful for more advanced statistics, such as regression.
How do you perform analysis on specific sub-groups?
Use the <if> condition, e.g., <sum varname if groupvar == value></if>
Differentiate between = and ==
”=” assigns values e.g., <generate>
"==" tests for equality e.g., if var == 10</generate>
What is routing in survey data?
Routing directs respondents to specific questions based on prior responses, often resulting in linked variables.
How do you check for routing?
Use <tab> to examine linked variables.</tab>
What does command <tab1> do?</tab1>
It generates frequency tables for multiple variables at once. Example <tab1></tab1>
How do you calculate CIs for the mean?
Use <ci></ci>
What affects the width of a CI?
- Higher confidence levels widen the interval
- Larger sample sizes narrow the interval
- Increased variability (SE) widens the interval