Lecture 12, 13 and 14 - Biostatistics Flashcards

Question

What are the two different sorts of errors?

Answer 1

There are two different sorts of errors… 1- Errors that make our answers more uncertain i.e. more variability 2- Errors that move us away from the truth i.e. we get the wrong answer and this is often called bias You can’t avoid 1st type (taking a sample, measuring things imperfectly). However, it is really important to avoid the 2nd type, as we do not want to undertake a study and get the wrong answer as it takes us away from the true estimate. A random sample from the whole population (as long as everyone takes part) can avoid the two.

Answer 2

A random sample means that everyone/everything has an equal chance of being chosen

Answer 3

Increasing the sample size does not help with dealing with bias

Answer 4

The sampling method must match the target population in order to get representative results.

Answer 5

When wondering if a sample is representative it is important to consider the people who won’t take part.

Answer 6

``` Mean (population) Standard deviation (population) ```

Answer 7

``` Sample mean (x) Sample size (n) Sample standard deviation (s) = standard deviation of the observations in a sample ```

Answer 8

Each circle on a sampling distribution is a sample mean ( a mean for each different sample) Variability of sampling distribution is called standard error (SE) (it is the standard deviation of sample means) Sampling distribution is centred on the population mean (when there is no bias) and has its own variability called the standard error, same as standard deviation but for the sample means

Answer 9

(population) population

Answer 10

(sample) proportion

Answer 11

Proportion = population proportion, as the sampling distribution is entered on the population proportion (when there is no bias) Standard error = variability/ standard deviation of the sampling distribution/spread of different proportions

Answer 12

Symmetric bell shaped curve, we keep seeing that the sampling distribution follows the shape of a ‘normal distribution’ If we have the mean and standard deviation we can draw its shape (precise curve that only depends on the mean and standard deviation) The normal distribution is the symmetric bell shaped curve that we keep seeing when we take repeated random samples from a population when the sample size is large. One of the key properties of the normal distribution is that 95% of the observations lie within 1.96 standard deviations of the mean. This is due to the shape of the normal distribution and this property is very useful when making an inference back to the population. Mean is always at the centre of a normal distribution/a normal distribution is always symmetric and centred at the mean

Answer 13

95% of the data lies between - 2 standard deviation from the means and + 2 standard deviation from the mean (within 2 standard deviations of the mean)

Answer 14

The sampling distribution will follow normal distribution (symmetric bell shaped curve) AND 95% of the sample means lie within +/- 2 standard errors of the population mean

Answer 15

If our sample is large (n is greater than 30) then we know the sampling distribution will be normal (symmetric bell curve), then the standard error can be estimated from the sample using the following equation … SE = standard deviation / square root number of sample

Answer 16

General formula is on desktop ... Where X represents the estimate and the s over square root n is the same as standard error, s represents the standard deviation This formula ensures that if we did repeated sampling 95% of intervals would contain the true population. Using this formula you can find the upper and lower confidence interval limits. 95% of intervals will contain the true population within 2 standard deviations of the mean (mean - 2sd and mean + 2sd)

Answer 17

Confidence intervals are a very useful way of understanding how much uncertainty we have in the mean or proportion. They reflect the width of the sampling distribution. Because we don’t know if our sample is one of the extreme ones or closer to the middle of the sampling distribution, we do not know if our confidence interval contains the true population mean or proportion. All we know is that if we took repeated samples, then 95% of the confidence intervals would contain the true mean or proportion, and 5% would not. This leads to the use of the phrase ‘We are 95% confident’ which means if we did this repeatedly, 95% of the intervals would contain the true population mean or proportion. The 95% confidence interval is very useful for interpreting our results; if it is wide then we don’t have much certainty about the estimate. If you end up working in clinical practice and you’re looking at the results of how effective a new drug is, the first thing you would want to look at would be the size of the confidence interval, followed by the study design and whether the results are even applicable to your patients.

Answer 18

95% still contains the true population mean however the confidence intervals will now be narrower as we now have more information with a larger sample size and therefore more certainty in the values for the sample.

Answer 19

No - there are small differences due to random variation, but we expect that 95% of all the confidence intervals will contain the population mean

Answer 20

We are 95% confident that the true population mean lies between the lower and upper confidence limit OR We are 95% confident that the true proportion lies between the lower and upper confidence limit

Answer 21

The '25th percentile’ - 25% of the sample is below this point and 75% is above this point The ‘median’ - 50% of the sample is above this point and 50% is below this point The ’75% percentile’ - 75% of the sample is below this point and 25% is above this point’ IQR is the range between the 25th percentile and 75th percentile and it contains the central 50% of all heights Check boxplot image on desktop

Answer 22

We can apply the general formula to proportions using … Proportions +and- 2xSE Always use the proportion NOT the percentage (i.e. write it between 0-1)

Answer 23

It means that there are lots of possible values. It becomes more precise with larger sample size as confidence interval decreases in size. The narrower the confidence interval, the more certainty we have about the size of the population mean. If the confidence interval is wide then we don’t know much about a population.

Answer 24

If there is a no difference the mean will sit on zero. To figure out the difference between the two groups then you find the difference in proportion by minusing the two groups.

Answer 25

If sample size is increased then the sampling distribution gets narrower

Answer 26

First one stays the same and the second one decreases

Answer 27

Variation that is a result of how a sample is obtained e.g. how measurements are taken, angle of tape measure when height is taken etc.

Answer 28

Sources of variation as a result of biological features such as genetics, nutrition, mutations etc.

Answer 29

Dot plot/boxplot is good for a small amount of data as it shows the data exactly. With just a few data paints, the dot plot would display the data more clearly - you can see exactly what the values are; the shape is not as important. A histogram tends to be very spiky and the boxplot hides the fact that there are very few data points.

Answer 30

You need to be certain that they are truly errprs, and will cause more bias if you leave them in than if you exclude them. Ideally you would correct the errors instead of just removing them all together.

Answer 31

The mean of all the sample means doesn’t change much - all are estimating the population mean

Answer 32

Standard error decreases as the sample size increases because with a larger sample size there is less variation in the sample

Answer 33

It is simply a line that best fits the data Y= Mx + C Less variation in regression lines as sample size increases and you get a much better sense of what the true relationship is between the variables being investigated Small samples don’t have as much reliability as shown by the variations in regression lines

Lecture 12, 13 and 14 - Biostatistics Flashcards

(58 cards)