Data description, populations and the normal distributions Flashcards

1
Q

Two measures of “spread” in a sample?

A

IQR and range (maximum-minimum)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Two ways of using histograms?

A

Show raw frequency or as a proportion of total number of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The only parameters of a normal distribution?

A

Mean (μ) and standard deviation (σ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Calculating SD for a sample?

A

Remove m from each value and square the result; then add these, divide by N+1, and square root. If did not square then signs would mean that = 0 as would cancel. Dividing by n stops SD simply increasing as sample gets bigger.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is median better than mean for skewed data?

A

Median will not be unduly affected by a select few very large values; mean will be.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

AUC up to X?

A

= P; the probability that in individual will have a (height) below X. The total AUC is therefore 1. The value of P corresponding to a given X is the “cumulative probability of the distribution at X”. The probability that a value is above X is obviously 1-P.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Symmetry of the curve for AUC?

A

If Y is X units below the mean, then P at Y = 0.24 (for example). Because normal distribution is symmetrical, going X units above the mean gives probability of being ABOVE this value of 0.24 also. The probability of being between Y and Z is therefore 1-(Z+Y)!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the inverse cumulative probability?

A

Allows you to use AUC of normal distribution to find the height where (3%) of boys are shorter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does P depend on the equation for X, where X= μ+Zσ?

A

Only through Z: this means that if change population via σ and μ, then P for the same value of Zσ away from μ (even though X itself will be different) is the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Significance of P only depending on μ and σ through Z?

A

Means that for Z of 1.96, P=0.975 i.e. 97.5%. This means that 2.5% of people have height >X, and also that 2.5% of people have height less than μ-Zσ. This means that 95% of people have height within μ±1.96σ. This is basis of Z scores!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Use of Z and μ and σ for quartiles?

A

For Z of 0.675, get P of 0.75 and therefore % within ZSDs of μ is (2*0.75)-1=0.5. The IQR is therefore (μ+0.675σ)-(μ-0.675σ) = 1.35σ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is IQR better than range for estimating spread?

A

As IQR = 1.35σ for any sample, can estimate σ by dividing IQR by 1.35. However, for the range (maximum-minimum), 1.35 will no longer be a constant but depends on sample size (expected range will be bigger from sample of 1000 rather than sample of 10). This means that the range not only reflects the spread of a sample, but also the sample size and so is not good for estimating population spread.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sample means, normality and sample size?

A

Even for skewed data, sample means are often normally distributed. Become closer to normal as sample size increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Using Z score to estimate third centile of a population?

A

Do m-1.88s; gives you proportion where 3% are lower. This is because get P of 0.97 so (P*2)-1=0.94 (3 either side). Only works if normal! But gives you the precision of counting a much larger sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Assuming normality for data that can only be positive?

A

Can be a problem; want negative values in this situation to have low probablity (1-5%). As 16% of a population falls below μ-σ, but only 2.5% below μ-2σ, then if estimated mean (m) is below estimated SD (s), normal probably inappropriate. As m becomes ~2* s, becomes okay, as low probability of negative values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Assessing normality using histogram (and alternative)?

A

Good premise but problem is that small samples even when drawn from normal population may not look normal. Alternative is to use normal probability plot. Works on the premise that as draw more individual values from a normal population and arrange in order, there will be clustering of values near μ. Normal probability plot uses expected values and will get ~straight line.

17
Q

Median for an even sample?

A

Take as halfway between the two middle values.