Data Flashcards

Question 1

Q

What is the median?

Answer

A

The middle value when values are ordered from the smallest to the largest

Question 2

Q

What is the mode?

Answer

A

The most common value

Question 3

Q

What is the mean?

Answer

A

The average value: sum of all values divided by the number of values

Question 4

Q

What is the standard deviation?

Answer

A

This is the average distance from the mean

Question 5

Q

What is the interquartile range?

Answer

A

This the difference between the 75th centile and the 25th centile

Question 6

Q

What are the best values to use to avoid the influence of outliers?

Answer

A

If the data is not symmetrical: Should use the median rather than the mean and should use the IQR rather than the standard deviation

If the data IS symmetrical, then should use the mean and the SD

Question 7

Q

What is the Gaussian distribution?

Answer

A

This is a curve representing symmetrical data - this is calculated from the mean and the standard deviation

The graph is of a symmetrical bell shape curve

(The peak of the curve represents the mean)

Question 8

Q

What effect does a changing standard deviation have on the Gaussian distribution?

Answer

A

A change in the SD will cause the curve to become flatter and wider or thinner and taller
BUT the curves will all have the same area beneath them

Question 9

Q

What effect does a changing mean have on the Gaussian distribution?

Answer

A

A change in the mean will cause the shape of the curve to remain the same but the location of the curve will shift further left or right (the peak of the curve represents the mean)

Question 10

Q

Why is the Gaussian distribution useful?

Answer

A

A constant proportion of values will lie within any specified number of SDs above or below the mean i.e. the Gaussian distribution is symmetrical

Question 11

Q

What is the ‘reference range’?

Answer

A

This is the proportion of the values that are lying within the number of SDs above or below the mean e.g. if the 1.96SD = 95% range –> the reference range is the 2.5th to 97.5th centile

Commonly, the reference range lies within the 95th centile

Question 12

Q

Why are samples used to estimate data?

What is the role of confidence values?

Answer

A

Not practical or feesible to measure the data from every single person in the country - so use a sample instead and then use this to estimate for the entire population

The confidence value allows us to analyse to what degree we agree that the information from the sample is reliable enough to use for the whole population i.e. confidence interval tells you how accurately the sample estimates of the population values are

Question 13

Q

How can the sample size be determined from the results?

Answer

A

A large enough sample size result in a Gaussian distribution from the sample mean and results

Question 14

Q

What is meant by standard error?

Answer

A

Standard error is the standard deviation of the sample distribution - this is a measure of the statistical accuracy of an estimate

Question 15

Q

What is the standard error of the mean?

How is this calculated?

Answer

A

This is the standard deviation of the distribution of all possible sample means - your sample is only one sample of all potential samples which could all provide differing results so you must account for this level of error

SD/square root of the sample size

Question 16

Q

What is meant by a confidence interval for the mean if e.g. Gaussian distribution?

How can this be calculated?

Answer

Study These Flashcards

A

A confidence interval of 95% for a Gaussian distribution means that you can expect 95% of all possible sample means to lie within the 1.96 standard errors of the true population mean

SO if you have mean of 22
Standard error of 0.3
Confidence interval is 22 +/- 1.96x0.3

Question 17

Q

What is meant by the confidence interval?

Answer

Study These Flashcards

A

E.g. a 95% confidence interval means that we are 95% sure that the true mean is between a certain range

Question 18

Q

What is the difference between the standard deviation and the standard error?

Answer

Study These Flashcards

A

SD - this indicates the amount of dispersion within a sample - used to calculate reference ranges for individual values

SE - measures the precision of a sample - used to calculate confidence intervals in sample means

Question 19

Q

What is the effect of an increasing or decreasing sample size on confidence intervals?

Answer

Study These Flashcards

A

Increases in the sample size number (if the mean and the SD remains the same) will result in a narrower confidence interval - a larger sample size allows you to be more confident

Question 20

Q

What are the different types of correlations shown on a graph and what value depicts this?

Answer

Study These Flashcards

A

Positive correlation: r = 1
Negative correlation: r = -1
No correlation: r = 0

r = the correlation coefficient and is always between -1 and 1

Question 21

Q

What can be used to determine whether results are statistically significant?

Answer

Study These Flashcards

A

Use the confidence intervals and p-values

Question 22

Q

What is a p-value and what is it’s significance?

Answer

Study These Flashcards

A

A p-value for a result is the probability of observing a result as or more extreme than the sample result if the underlying assumption in the sample population is true

If the p-value is below 0.05 then it has statistical significance i.e. below 0.05 means that the results are unlikely to be due to a chance effect