Data description, populations and the normal distributions Flashcards

Question 1

Q

Two measures of “spread” in a sample?

Answer

A

IQR and range (maximum-minimum)

Question 2

Q

Two ways of using histograms?

Answer

A

Show raw frequency or as a proportion of total number of observations

Question 3

Q

The only parameters of a normal distribution?

Answer

A

Mean (μ) and standard deviation (σ)

Question 4

Q

Calculating SD for a sample?

Answer

A

Remove m from each value and square the result; then add these, divide by N+1, and square root. If did not square then signs would mean that = 0 as would cancel. Dividing by n stops SD simply increasing as sample gets bigger.

Question 5

Q

Why is median better than mean for skewed data?

Answer

A

Median will not be unduly affected by a select few very large values; mean will be.

Question 6

Q

AUC up to X?

Answer

A

= P; the probability that in individual will have a (height) below X. The total AUC is therefore 1. The value of P corresponding to a given X is the “cumulative probability of the distribution at X”. The probability that a value is above X is obviously 1-P.

Question 7

Q

Symmetry of the curve for AUC?

Answer

A

If Y is X units below the mean, then P at Y = 0.24 (for example). Because normal distribution is symmetrical, going X units above the mean gives probability of being ABOVE this value of 0.24 also. The probability of being between Y and Z is therefore 1-(Z+Y)!

Question 8

Q

What is the inverse cumulative probability?

Answer

A

Allows you to use AUC of normal distribution to find the height where (3%) of boys are shorter.

Question 9

Q

How does P depend on the equation for X, where X= μ+Zσ?

Answer

A

Only through Z: this means that if change population via σ and μ, then P for the same value of Zσ away from μ (even though X itself will be different) is the same.

Question 10

Q

Significance of P only depending on μ and σ through Z?

Answer

A

Means that for Z of 1.96, P=0.975 i.e. 97.5%. This means that 2.5% of people have height >X, and also that 2.5% of people have height less than μ-Zσ. This means that 95% of people have height within μ±1.96σ. This is basis of Z scores!

Question 11

Q

Use of Z and μ and σ for quartiles?

Answer

A

For Z of 0.675, get P of 0.75 and therefore % within ZSDs of μ is (2*0.75)-1=0.5. The IQR is therefore (μ+0.675σ)-(μ-0.675σ) = 1.35σ.

Question 12

Q

Why is IQR better than range for estimating spread?

Answer

A

As IQR = 1.35σ for any sample, can estimate σ by dividing IQR by 1.35. However, for the range (maximum-minimum), 1.35 will no longer be a constant but depends on sample size (expected range will be bigger from sample of 1000 rather than sample of 10). This means that the range not only reflects the spread of a sample, but also the sample size and so is not good for estimating population spread.

Question 13

Q

Sample means, normality and sample size?

Answer

A

Even for skewed data, sample means are often normally distributed. Become closer to normal as sample size increases.

Question 14

Q

Using Z score to estimate third centile of a population?

Answer

A

Do m-1.88s; gives you proportion where 3% are lower. This is because get P of 0.97 so (P*2)-1=0.94 (3 either side). Only works if normal! But gives you the precision of counting a much larger sample.

Question 15

Q

Assuming normality for data that can only be positive?

Answer

A

Can be a problem; want negative values in this situation to have low probablity (1-5%). As 16% of a population falls below μ-σ, but only 2.5% below μ-2σ, then if estimated mean (m) is below estimated SD (s), normal probably inappropriate. As m becomes ~2* s, becomes okay, as low probability of negative values.

Question 16

Q

Assessing normality using histogram (and alternative)?

Answer

A

Good premise but problem is that small samples even when drawn from normal population may not look normal. Alternative is to use normal probability plot. Works on the premise that as draw more individual values from a normal population and arrange in order, there will be clustering of values near μ. Normal probability plot uses expected values and will get ~straight line.

Question 17

Q

Median for an even sample?

Answer

A

Take as halfway between the two middle values.