Critical Numbers Flashcards

Question

What are the benefits of a case-control study?

Answer 1

- Faster: use past data so do not require long follow-up - Useful for rare outcomes: select participants on the basis of outcome - Cheaper

Answer 2

- More prone to bias or poor quality data - Harder to show causal relationship - Not ideal for rare exposures

Answer 3

- Non-randomised - Observational - Single time point Look at a sample at the unexposed and exposed outcomes and no outcomes

Answer 4

- Relatively quick - Cheap - Can assess multiple exposures/outcomes

Answer 5

- Susceptible to bias - Cannot prove causality - Not ideal for rare exposures/outcomes

Answer 6

The unit of observation is group (aggregate) rather than individual e.g. Electoral ward, country Some pros: - Large-scale comparisons - Can quantify geographical or temporal trends Some cons: - Ecological fallacy - Cannot make inference at the individual level

Answer 7

-binary - ordinal -nominal

Answer 8

Discrete and continuous

Answer 9

Only two categories (e.g. positive and negative)

Answer 10

Categories with natural order (e.g. stage of cancer)

Answer 11

Categories with no natural order (e.g. blood group)

Answer 12

Observations can only take certain numerical values (e.g. number of children)

Answer 13

Observations can take any value within a range (e.g. height)

Answer 14

The number with a characteristic or outcome divided by the total number. Used to describe probability or risk (scale 0-1)

Answer 15

Proportion multiplied by 100

Answer 16

The number with an exposure or outcome divided by the number without. The ratio of the probability of an event occurring to the probability of it not occurring

Answer 17

A rate is the frequency per another unit of measurement . This allows us to account for variation . Once an outcome has occurred an individual will not be at risk either forever or for some period of time. Person-time risk is not always known and may be approximated

Answer 18

Difference in proportions between groups If there is no difference this will be 0

Answer 19

The risk in one group divided by the risk in the other If there is no difference the ratio will be 1 Ratios >1 indicate higher risk/odds in group of interest Ratios<1 indicate lower risk/odds in group of interest The more common the outcome, the more apparent the difference between risk and odds ratios

Answer 20

Odds in one group divided by the odds in the other If there is no difference the ratio will be 1 Ratios >1 indicate higher risk/odds in group of interest Ratios<1 indicate lower risk/odds in group of interest The more common the outcome, the more apparent the difference between risk and odds ratios

Answer 21

Sum of the values divide by the count

Answer 22

order the values then take the midpoint

Answer 23

The most common value

Answer 24

The standard deviation

Answer 25

A central range

Answer 26

- Standard deviation – describes dispersion of values around the mean - When describing samples the mean is denoted by ¯𝒙 and the SD by s - When describing populations the mean is denoted by µ and the SD by σ

Answer 27

Range- lowest value and the highest value Centiles- The median is the 50th centile. We can describe the spread using centiles around that e.g. 5th to 95th gives 90% central range

Answer 28

Interquartile range: - the 25th to 75th centile, which gives the 50% range

Answer 29

The Gaussian distribution or the "bell-shaped curve"

Answer 30

They will be the same

Answer 31

More wide spread curve and the apex is lower

Answer 32

The sample has the same mean but the median is lower

Answer 33

The sample has the same mean but the median is higher

Answer 34

It is 'pulled out' by extreme values

Answer 35

will always have 50% of the data to either side

Answer 36

Parametric – make distributional assumptions Non-parametric – make no assumptions (distribution-free)

Answer 37

Symmetric (mean, median and mode are equal)

Answer 38

68% of values lie within 1 SD of the mean 95% of values lie within 2 SD of the mean 99.7% of values lie within 3 SD of the mean

Answer 39

Correlation – a measure of linear relationship between variables - Quantified by the correlation coefficient r - r is bound between -1 and 1 - The closer to 1/-1, the stronger the correlation - the closer to 0, the weaker the correlation - Can be positive (as one variable increases, so does the other) - Or negative (as one variable increases, the other decreases) - The ordering of the variables does not matter

Answer 40

- We can assess Normality - We can identify outliers (also useful for identifying data entry errors) - We can determine whether data might benefit from transformation - We can assess collinearity - We can choose a method of analysis best suited to our research question and data: - - Parametric – make distributional assumptions Non-parametric – make no assumptions (distribution-free)

Answer 41

- Descriptive statistics relate to the sample - Inferential statistics relate to the population - We infer properties of the population by using sample statistics to derive estimates of population parameters and test hypotheses - When making inference from a sample we need to account for uncertainty in our sample estimates

Answer 42

Produces variation - need to account for when making inference

Answer 43

If we were to take repeat samples and calculate the mean each time, those sample means will be Normally distributed around the true population mean even if the population itself is not normally distributed

Answer 44

The standard error is a type of standard deviation (It is the standard deviation of the sampling distribution) (Both are measures of spread) The standard Deviation is for Describing The standard Error is for Estimating - The standard error indicates how different a sample mean is likely to be from the population mean - It tells us the precision of estimation - The smaller the standard error of the mean, the more precise our estimate of the mean i.e. the closer it is likely to be to the true population mean

Answer 45

SD/ root (n)

Answer 46

Bigger then SD, bigger the standard error Bigger the sample size, smaller the standard error This makes sense because the less variable the data are, the more precise our estimation. The more people we sample, the better the representation and therefore the more precise our estimation.

Answer 47

We can use the sample mean and standard error of the mean and properties of the Normal distribution to calculate a range of values we can be confident includes the true mean This is called the confidence interval We are now no longer just describing our sample – we are now making inference about the population parameter

Answer 48

- Variability in the sample (SD) - Sample size (n) - The desired level of confidence: typically we use 95% but it could be 90%, 99%, etc.

Answer 49

Means Differences in means Proportions Differences in proportions Correlation coefficients Relative risks Odds ratios

Answer 50

- We can perform a statistical test to determine how likely the result we have observed is ‘real’ - Or if it is more likely there is no true difference and we are just seeing chance variation - To do this we test the hypothesis of no difference between groups - We then weigh up the strength of the evidence against that hypothesis - And come to a conclusion

Answer 51

- Probability values range from 0 to 1 (though as you’ve seen we often x100 to express as a percentage) - A probability of 0 means an event is impossible - A probability of 1 means an event is certain - So the smaller the probability the less likely the outcome

Answer 52

- Define the null hypothesis: - This is typically the theory we want to disprove - We will assume this hypothesis is true until we see sufficient evidence to the contrary - Denoted H0 - In our example: H0 = no difference in mean IQ between groups

Answer 53

- Define the alternative hypothesis: - This is the opposite theory to the null - Denoted HA or H1 - In our example: HA = there is a difference in mean IQ between groups

Answer 54

Choose a significance level for the test: - This is how we determine whether our result is statistically significant - It is also the probability we make a false positive conclusion and reject the null hypothesis when it is in fact true - So we need to minimise this risk - Typically it is set around 0.05 (so 5%)

Answer 55

Perform an appropriate statistical test: - We then compare that test statistic to the distribution we would expect under the null hypothesis and work out the probability of our result if the null were true

Answer 56

Decision time: We use the probability value from the statistical test to weigh up the strength of the evidence against the null hypothesis We call this probability value the p-value The p-value is the probability of seeing an effect of the observed magnitude or greater if the null hypothesis were true

Answer 57

The result is probable under the null hypothesis… so it is likely the null hypothesis is true

Answer 58

We reject the null hypothesis - The smaller the p-value, the less likely it is we would see our observed result under the null hypothesis

Answer 59

- Gives our plausible range for the true population difference - Can be used to determine statistical (and clinical) significance - Thus is more informative than the p-value alone

Answer 60

- Statistical significance just means an observed result is unlikely due to chance - Clinical significance means the result is practically important

Critical Numbers Flashcards

(84 cards)