Week 2: Normal Distribution, Inference, and Confidence Intervals Flashcards

1
Q

How do you create a histogram with a normal curve?

A

Use <histogram varname, frequency normal> to overlay a normal curve on the histogram.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is skewness?

A

Skewness measures the symmetry of the distribution.
- Negative skew: Long tail to the left
- Positive skew: Long tail to the right
- Normal distribution: Skewness = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is kurtosis, and how is it interpreted?

A
  • Kurtosis measures the “peakedness” of a distribution.
  • Kurtosis = 3: Normal (mesokurtic)
  • Kurtosis > 3: Leptokurtic (peaked with heavy tails)
  • Kurtosis < 3: Platykurtic (flat with light tails)
    The “excess” kurtosis is used in statistics to compare the kurtosis coefficient with that of a normal distribution.
    Note: Kurtosis is often not reported in publications.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What command shows detailed statistics, including skewness and kurtosis?

A

Use <sum varname, detail>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the purpose of transforming variables?

A

To make non-normal data more normally distributed, enabling the use of parametric tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you create a logarithmic transformation?

A

Use <generate newvar=log(varname)>
Note: You can also use the command <ln> instead of <log>. You also cannot calculate the log of a negative number (or zero). These numbers, if present, will be set to system-missing in the transformed variable.</log></ln>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are Z-scores, and why are they useful?

A

Z-scores standardise data, converting it to a distribution with a mean of 0 and SD of 1, useful for comparison across datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you create Z-scores?

A

Use <egen zvarname = std(varname)>
These are more useful for more advanced statistics, such as regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you perform analysis on specific sub-groups?

A

Use the <if> condition, e.g., <sum varname if groupvar == value></if>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Differentiate between = and ==

A

”=” assigns values e.g., <generate>
"==" tests for equality e.g., if var == 10</generate>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is routing in survey data?

A

Routing directs respondents to specific questions based on prior responses, often resulting in linked variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you check for routing?

A

Use <tab> to examine linked variables.</tab>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does command <tab1> do?</tab1>

A

It generates frequency tables for multiple variables at once. Example <tab1></tab1>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you calculate CIs for the mean?

A

Use <ci></ci>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What affects the width of a CI?

A
  • Higher confidence levels widen the interval
  • Larger sample sizes narrow the interval
  • Increased variability (SE) widens the interval
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the significance of overlapping CIs?

A

If intervals overlap, differences between groups are unlikely to be statistically significant.

17
Q

How do you explore relationships between multiple categories?

A

Use <bysort categoryvar: sum varname> or <bysort categoryvar: tab var1 var2>

18
Q

How do transformations like logarithm and square root affect skewed data?

A

They reduce skewness and bring the distribution closer to normality, especially for positively skewed data.
Square root transformations are often used for count data (e.g., counting the number of butterflies seen in a garden each day), when low counts might be more common than higher counts.

19
Q

What are examples of linked variables in the ELSA dataset?

A
  • heill (long-term illness) and helim (activity limitation)
  • spcaa (caring responsibilities) and spcac (hours spent caring)
20
Q

How do you check skewness and kurtosis for a variable?

A

Use <sum varname, detail>

21
Q

How do you interpret a mean with 95% CIs?

A

The interval represents the range in which the true population mean is likely to fall, within 95% confidence.

22
Q

What are graphical ways of assessing normality?

A

Histogram, boxplot, a probability-probability (P-P) plot, and a quantile-quantile (Q-Q) plot.

23
Q

What are numerical ways of assessing normality?

A

Kolmogorov-Smirnov test, Shapiro-Wilk test, and the Skewness/Kurtosis tests.

24
Q

What is a way to show label and values aside from using <tab>?</tab>

A

Command <fre>
It is user-written so it has to be installed first through using the command <ssc>
To view the labels and values of a variable, simply use <fre></fre></ssc></fre>

25
Q

What is a smoother alternative to using histograms to assess normality?

A

Kernal density plots - approximate the probability density of a variable.
Unlike histograms, Kernal density plots are not sensitive to the number and positioning of bins used in the display.

26
Q

How do you do simple calculations?

A

Use command <display> followed by calculation</display>