RESS ebook Flashcards

1
Q

what is the formula for incidence rate?

A

Incidence rate = (Number of new cases in period/
Number at risk in population in period)

nb if number at risk in population changes throughout the period then you take the number at risk half way through the defined period

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what formula do you use to work out prevalence rate?

A

Prevalence =
(Number of people with a disease at a certain time/
Number of people in the population at that certain time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the formula for case fatality rate?

A

Case fatality rate =
(Number of people who die from the disease in period/
Number of people with the disease in period)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the formula to calculate risk?

A

Risk =
(Number of new cases /
Number at risk)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the risk ratio? and how is it calculated?

A

The risk ratio is a measure of relative risk and is so-called because it is a ratio of the risk of disease in the exposed group and the risk of disease in the unexposed group .

  • Relative risk, often abbreviated to RR, is used to describe measures of risk ratio, odds ratio, and incidence rate ratio.

(Epidemiologists and statisticians sometimes disagree over definitions, but relative risk is a useful umbrella term for measures that compare the disease between different exposures or treatments)

risk ratio -=
(no. of people who were EXPOSED to risk factor and then GOT the disease / no. of people who were EXPOSED to risk factor)
divided by:
(no.of people who were NOT EXPOSED to risk factor but GOT the disease / no. of people who were NOT EXPOSED to risk factor)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how do you calculate the ODDS of an event occurring?

A

Odds of an event =
(Probability of event /
Probability that event does not occur)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is odds ratio (OR)? and how is it calculated?

in what sort of trial would you use it?

A

The odds ratio is the measure that is calculated to represent the relative risk for a case-control study.

odds ratio=
(no. of people who were EXPOSED to risk factor and GOT the disease / no. of people who were EXPOSED to risk factor and DID NOT get the disease)
divided by:
(no. of people who were NOT EXPOSED to risk factor and GOT the disease / no. of people who were NOT EXPOSED to risk factor and DID NOT get the disease)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

which method of calculating risk (relative risk or odds ratio) is used for cohort studies? and randomised controlled trials?

A

cohort studies use: relative risk and/or odds ratio

randomised controlled trials use: ONLY odds ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how do you transform your data to fit a standard normal distribution?

A

The standard curve has a mean of zero and the standard deviation is one. It is called the standard normal distribution.

To be able to make probability statements about any normal variable, it is necessary to transform it to a standard normal. To do this, for any value in the distribution you subtract the mean, and then divide by the standard deviation. The mathematical equation is:

z =
x - µ
σ where x is the value to be transformed to a standard normal variable value z.

For the example above of height, the mean is 163cm and the standard deviation is 6cm. So a height of 165 cm is transformed as follows:

  1. subtract the mean of 163 to get 165-163 = 2
  2. divide by the standard deviation of 6 to get z = 2/6 = 0.33.

This value calculated above, known as the z value, allows the area to the left of the point to be determined by looking the value up in a table, which we will do in the next section. The area under the curve for a value where z is less than, or equal to, 0.33 is 0.6293. Since the total area under the curve is one, the probability that a value is greater than z = 0.33 is 1 - 0.6293 = 0.3707. Consequently the probability that a patient selected at random is taller than 165 cm is 0.37 (or 37%).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is the standard error and how is it calculated?

A

The sample mean is used as an estimate of the ‘true’ population mean. The spread of the sample mean, not the spread of the actual measurements but the spread of the mean of the measurements, is given by the ‘standard error’ or the ‘standard error for the mean’.

There is a relationship between the standard deviation (sd) of the population and the standard error of the sample mean (se) taking into account the sample size (n). This is given by:

se = sd / square root of n

As an example, suppose that the standard deviation of the systolic blood pressure is 19.0 mmHg, the standard error of the mean of a sample of size n = 47 is

se equals 19.0 divided by the square root of 47 equals 19.0 divided by 6.856 equals 2.77 mmHg.

When the sample size increases to n = 326, the standard error reduces to

19.0 divided by the square root of 326 equals 19.0 divided by 18.055 equals 1.05 mmHg.

The standard error is used to calculate the confidence interval of a mean, and is an important concept for the presentation of results in research articles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

using a standard normal distribution, what is the z value above which 2.5% of the population lies?

A

1.96

(and 2.5% of pop. lies below -1.96)

This means that 95% of the distribution lies between z = -1.96 and z = 1.96

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how do you calculate a 95% confidence interval and what does this mean?

A

A 95% confidence interval for the true population mean is given by:

x-bar minus 1.96 multiplied by se, x-bar plus 1.96 multiplied by se

which can also be written as:

(x-bar minus 1.96 multiplied by sd over the square root of n, x-bar plus 1.96 multiplied by sd over the square root of n)

The cutoff values for the standard deviations (-1.96, and 1.96 in the above example) are obtained from the standard normal distribution. These values are used as they contain 95% of the data (or area) within a normal distribution. This confidence interval can be interpreted as follows: if this study is repeated many times, 95% of the times this interval will include the true population mean. The important role of the confidence interval is that it gives a feasible range of values within which the true population might lie.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how do you calculate degrees of freedom?

A

The degrees of freedom are calculated as one less than the sample size, that is n minus 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

when would you use degrees of freedom?

A

when the sample size is lower than 200 (or the SD is not known), you use the degrees of freedom to find the correct t-value to use (by finding this on a given table) to calculate the 95% confidence interval

nb the t value that is used for known SD or if the sample size is above 200 is 1.96

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the step by step process to work out any 95% confidence interval?

A

Step 1 Determine the sample size n and the number of degrees of freedom n minus 1

Step 2 Determine the mean (x bar) and standard deviation of the sample

Step 3 Calculate the standard error (se) (se = sd / square root of n)

Step 4 Look up the critical value t given in the table

Step 5 The 95% confidence interval is:

x-bar - (t x se) , x-bar + (t x se)

Here is an example of this using the procedure above. Let us assume that the sample size n=12, and therefore the number of degrees of freedom n - 1 = 11. The sample mean x bar = 120mmHg and sample standard deviation sd = 20mmHg.

Step 3 Calculate the standard error se calculation

Step 4 From the table (opens in a new window), get the t value t equals 2.20

Step 5 The 95% confidence interval is: confidence interval equation

which is: (120 minus 2.20 times 5.77, 120 plus 2.20 times 5.77)

which is: (107.3, 132.7) mmHg.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

True or false: the 95% confidence interval is wider than the 90% confidence interval

A

True

If we want to be more confident (95% confident rather than 90% confident) that our range of plausible values includes the true population value then it makes sense that the confidence interval has to be wider

17
Q

what is a two-sided hypothesis and why is it used in medicine?

A

In medical statistics it is better, and almost always the case, to take the alternative hypothesis (known as H1 or Ha) to be ‘two-sided’. This means, that if there is an effect, that effect may be beneficial or harmful. There would be two one-sided hypotheses, one would be that the effect were beneficial, and the other harmful.

As an example, consider a study in which 400 women with hypertension are prescribed a new anti-hypertensive treatment. Systolic blood pressure is measured before the treatment and then again after the course of treatment and changes in blood pressure are recorded. The null hypothesis will be that the mean change in blood pressure is zero: the treatment has no effect. The alternative hypothesis will be that the mean change is not zero: the mean change is either less than zero (the pressure has gone down and the drug is considered to be a benefit) or greater than zero (the drug is considered to be harmful). This is a two-sided hypothesis.

18
Q

what does t stand for?

A

test statistic

19
Q

what is the formula to calculate the test statistic?

A

t = (x-bar - mu) / se

mu, represents the appropriate value to use which is determined by the test you are conducting

mu is 0 in the null hypothesis

20
Q

what is the p value? and what p value is normally used to indicate that a result is statistically significant?

A

The p value is the probability, under the null hypothesis, that such a test statistic could take a value at least that extreme. Thus the p value is a measure of the probability of obtaining the results of the test given that the null hypothesis is true. The smaller the p value the less likely the result is to have occurred by chance alone.

When the p value is less than 0.05, then it is conventional to say that the result is statistically significant. This figure of 0.05 has been used for the last century by the medical and scientific communities as the threshold to use. A probability of 0.05 is a 1 in 20 chance of an event happening. Formally it can be stated that if the p value is less than 0.05 then the result is statistically significant at the 5% level.

For the example of hypertension treatment with 400 patients, the value calculated for the test statistic is t=-6.06. As the t distribution is symmetric we ignore the negative sign, that is we use t=6.06. The number of degrees of freedom is 399. The value t=6.06 is larger than t=3.291 (the value of t for p=0.001 and degrees of freedom over 120) and so the corresponding p value is less than 0.001 (written as p<0.001). As this is certainly less than 0.05 the result is statistically significant.

21
Q

Do we need to consider the clinical significance of a study where the results are not statistically significant?

A

yes, We do sometimes need to consider the clinical interpretation of non-significant statistical results. It is possible to get a non-significant statistical result when there is a real clinical difference. We are particularly concerned if the mean difference is large but the result is non-significant due to a small sample size.

(or could be people react wildly different to drug depending on, for eg, their genes, which just using the mean could mask!)

22
Q

What does the p-value tell us about the data?

A

how likely it is that the null hypothesis is correct

23
Q

what is the pearson’s coefficient? and what do the values mean?

A

Pearson correlation coefficient, r, is a measure of linear correlation between two numeric variables. It is a measure of how well the data fit a straight line. The value r lies between 1 and -1.

  • If r > 0 we have a positive correlation; implying that if one variable increases then so does the other.
  • If r < 0 we have a negative correlation; implying that if one variable increases then the other decreases.
  • If r = 0 we have no correlation; implying there is no association between the two variables.
  • If r= 1there is a perfect positive correlation.
  • If r= -1 there is a perfect negative correlation.

• nb Correlations of less than 0.7 should be treated with suspicion.

24
Q

when should pearson’s correlation coefficient not be used?

A

Correlation should not be used when:

  • There is a non-linear relationship between variables (see Figure (a) below).
  • There are outliers (see Glossary)
  • There are distinct sub-groups, for example, if we mix two samples together such as healthy controls and disease cases (see Figure (b)).
  • One or both of the variables is not normally distributed.
  • One or both of the variables is non-numeric.
25
Q

A Pearson correlation coefficient should be only calculated between two normally distributed variables.

what do you use when pearson’s cannot be used?

A

the Spearman rank correlation coefficient, rho.

This correlation coefficient can be used when the data is not normally distributed, when one or both of the variables are ordinal, or when the sample size is small.

it gives the same values as pearson’s

26
Q

If two variables, A and B, are correlated then there are four possibilities, what are they?

A

• The result occurred by chance
• A influences (or ‘causes’) B
• B influences (or ‘causes’) A. Not the same thing as A causes B, but the statistical measure of correlation will be the same
• A and B are influenced by some other variable(s), C. This can happen in two ways:
—— 1) C may ‘cause’ both A and B. For example, an increased consumption of sugar increases the number of caries a person has and increases their weight. Does more weight cause more caries? Probably not, but weight will be correlated with caries.
—— 2) A may lead to an increase in C which ‘causes’ B e.g. low income may increase chance of smoking which increases chance of death from lung cancer. Does low income cause lung-cancer?

27
Q

what do we need to know about the chi squared test?

A

When we wish to examine the association between two categorical variables, we can create a table, such as the one shown below, known as a ‘contingency table’. We can perform a hypothesis test on a table of one or two categorical variables. The test that is used most often is a chi-squared test. The null hypothesis of the chi-squared test is that there is no association between the two variables. The test works by comparing the contingency table we observe from our results with the one we would expect if the null hypothesis were true.

  Caries   
                            No  Yes Total 
Fluoridated         77   29    106 
Non-fluoridated  95   31    126 
Total                    172  60   232 

This table shows a simple contingency table, which has two categorical variables, and the chi-squared test is the most appropriate way of testing the null hypothesis. You will not be expected to be able to calculate a chi-squared test in an exam, but we wish you to be aware of being able to use and interpret chi-squared tests.

Using Stata the chi-squared value =0.228 (with 1 degree of freedom), p=0.63. As p> 0.05 we cannot reject the null hypothesis of no association, which means that this data suggests that Caries are not associated with fluoridated water.

Conditions for the chi-squared test

The number of expected values in each of the four cells should be greater than 1. And in three of the four cells the expected value should be greater than 5.

Continuity correction (Yates’s correction)

For small sample sizes the chi-squared test is too likely to reject the null hypothesis. A continuity correction can be made to allow for this. Although it is only strictly necessary on small sample sizes I would recommend always using it. The two conditions above still have to be met.

Fisher’s exact test

If a contingency table fails to meet the conditions required for the chi-squared test then Fisher’s exact test can be used. This is based on different mathematics to the chi-squared test, and is more robust when sample sizes are small. However, we will not be investigating this further. You should be aware of the limitations of the chi-squared test and that there are methods to overcome these shortcomings.

28
Q

Pearson correlation coefficient is a measure of linear association between two of what type of variable?

A

continuous variables

A Pearson correlation coefficient is about measuring a linear association between two continuous variables. It should not be used if you have categorical variables (nominal or ordinal)

29
Q

what are the 3 different types of disease prevention?

A

primary = preventing disease occuring in currently unaffected individuals (eg vaccination)

secondary = preventing clinical symptoms of disease occurring when disease if currently asymptomatic (eg cervical smears)

tertiary = preventing relapse/controlling of symptoms of a chronic disease to increase functionality (eg stroke rehabilitation)

30
Q

what is the positive predictive value? and how is it calculated?

A

positive predictive value:
- If a person tests positive, what is the probability that he or she has the condition

no. true positives/ (no. true positives + no. false positives)

31
Q

what is the negative predictive value? how is it calculated?

A

negative predictive value:
- If a person tests negative, what is the probability that he or she does not have the condition

no. true negatives/ (no. true negatives + no. false negatives)

32
Q

what is right and left censoring of data?

A

Right censored data occurs when the people in the study did not reach a failure before the end of the study. For example, in a study looking at a new drug to treat HIV, right censored data will occur if the study participants die of other non-AIDS causes, if by the end of the study some participants have not developed AIDS, or if some have left the study (e.g. by leaving the country).

Left censoring is when we are not certain what happened to people before the time at which they entered the study. A common example is when people already have the disease of interest when the study starts.

33
Q

what is a ‘failure’ in a research study?

A

The time the person leaves the study is known. Leaving the study may be due to the event happening (such as death in a survival analysis), the person may ask to leave the study, or the study may lose track of a person. The event is known as a ‘failure’

34
Q

what is:

  • the survival function?
  • the hazard function?
A

survival function
- the chance of survival until a certain time

hazard function
- the chance of instantaneous failure at any one time

35
Q

what are two common ways of examining survival data?

A

log-rank test

Kaplan-Meier plot

36
Q

what is the difference between a case-control study and a cohort study?

incl pros/cons

A

Cohort studies

Cohort studies begin with a group of people (a cohort) free of disease. The people in the cohort are grouped by whether or not they are exposed to a potential cause of disease. The whole cohort is followed over time to see if the development of new cases of the disease (or other outcome) differs between the groups with and without exposure.

For example, you could do a cohort study if you suspect there might be a causal relationship between the use of a certain water source and the incidence of diarrhea among children under five in a village with different water sources.

You select a group of children under five years, either all children of that age in the village, a random sample taken from the population register, or e.g. children living in the same area, or attending the same clinic. Then you classify them as either using the suspected water source or other water sources. You check e.g. after two weeks whether the children have had diarrhea.

You can then calculate how many diarrhea cases there were among those children using the suspected water source and those using other sources of water supply (cumulative incidence of diarrhea). How to compare the cumulative incidence rates of the two groups, in order to conclude whether the suspected water source is a risk factor for the disease or not, will be discussed in a future blog.

Case-control studies

The same problem could also be studied in a case-control study. A case-control study begins with the selection of cases (people with a disease) and controls (people without the disease). The controls should represent people who would have been study cases if they had developed the disease (population at risk).

The exposure status to a potential cause of disease is determined for both cases and controls. Then the occurrence of the possible cause of the disease could be calculated for both the cases and controls. To come back to the example, you may compare children who present themselves at a health center with diarrhea (cases) with children with other complaints, for example acute respiratory infections (controls). You determine which source of drinking water they had used. Then calculate the proportion of cases and controls that were exposed to the suspected water source.

Pro’s and con’s

On what basis do you decide to choose a cohort design or a case-control design?

Cohort studies provide the best information about the causation of disease, because you follow persons from exposure to the occurrence of the disease. With data from cohort studies you can calculate cumulative incidences, which are the most direct measurement of the risk of developing disease.

An added advantage is that you can examine a range of outcomes/diseases caused by one exposure (e.g. heart disease, lung disease, renal disease caused by smoking).

However, cohort studies are major undertakings. They may require long periods of follow-up since disease may occur a long time after exposure. Therefore, it is a very expensive study design.

Cohort studies work well for rare exposures–you can specifically select people exposed to a certain factor. But this design does not work for rare diseases–you would then need a large study group to find sufficient disease cases.

Case-control studies are relatively simple to conduct. They do not require a long follow-up period (as the disease has already developed), and are hence much cheaper. This design is especially useful for rare diseases (as you select the cases yourself), but not for rare causes (as you will probably not find these in sufficient number in your study). It is also very suitable for diseases with a long latent period, such as cancer.

However, case-control studies are less adept at showing a causal relationship than cohort studies. They are more prone to bias.

One example is recall bias: cases might recall certain exposures more clearly than controls, simply due to the fact that they have thought about what could have caused their disease.