RESS ebook Flashcards
what is the formula for incidence rate?
Incidence rate = (Number of new cases in period/
Number at risk in population in period)
nb if number at risk in population changes throughout the period then you take the number at risk half way through the defined period
what formula do you use to work out prevalence rate?
Prevalence =
(Number of people with a disease at a certain time/
Number of people in the population at that certain time)
what is the formula for case fatality rate?
Case fatality rate =
(Number of people who die from the disease in period/
Number of people with the disease in period)
what is the formula to calculate risk?
Risk =
(Number of new cases /
Number at risk)
what is the risk ratio? and how is it calculated?
The risk ratio is a measure of relative risk and is so-called because it is a ratio of the risk of disease in the exposed group and the risk of disease in the unexposed group .
- Relative risk, often abbreviated to RR, is used to describe measures of risk ratio, odds ratio, and incidence rate ratio.
(Epidemiologists and statisticians sometimes disagree over definitions, but relative risk is a useful umbrella term for measures that compare the disease between different exposures or treatments)
risk ratio -=
(no. of people who were EXPOSED to risk factor and then GOT the disease / no. of people who were EXPOSED to risk factor)
divided by:
(no.of people who were NOT EXPOSED to risk factor but GOT the disease / no. of people who were NOT EXPOSED to risk factor)
how do you calculate the ODDS of an event occurring?
Odds of an event =
(Probability of event /
Probability that event does not occur)
what is odds ratio (OR)? and how is it calculated?
in what sort of trial would you use it?
The odds ratio is the measure that is calculated to represent the relative risk for a case-control study.
odds ratio=
(no. of people who were EXPOSED to risk factor and GOT the disease / no. of people who were EXPOSED to risk factor and DID NOT get the disease)
divided by:
(no. of people who were NOT EXPOSED to risk factor and GOT the disease / no. of people who were NOT EXPOSED to risk factor and DID NOT get the disease)
which method of calculating risk (relative risk or odds ratio) is used for cohort studies? and randomised controlled trials?
cohort studies use: relative risk and/or odds ratio
randomised controlled trials use: ONLY odds ratio
how do you transform your data to fit a standard normal distribution?
The standard curve has a mean of zero and the standard deviation is one. It is called the standard normal distribution.
To be able to make probability statements about any normal variable, it is necessary to transform it to a standard normal. To do this, for any value in the distribution you subtract the mean, and then divide by the standard deviation. The mathematical equation is:
z =
x - µ
σ where x is the value to be transformed to a standard normal variable value z.
For the example above of height, the mean is 163cm and the standard deviation is 6cm. So a height of 165 cm is transformed as follows:
- subtract the mean of 163 to get 165-163 = 2
- divide by the standard deviation of 6 to get z = 2/6 = 0.33.
This value calculated above, known as the z value, allows the area to the left of the point to be determined by looking the value up in a table, which we will do in the next section. The area under the curve for a value where z is less than, or equal to, 0.33 is 0.6293. Since the total area under the curve is one, the probability that a value is greater than z = 0.33 is 1 - 0.6293 = 0.3707. Consequently the probability that a patient selected at random is taller than 165 cm is 0.37 (or 37%).
what is the standard error and how is it calculated?
The sample mean is used as an estimate of the ‘true’ population mean. The spread of the sample mean, not the spread of the actual measurements but the spread of the mean of the measurements, is given by the ‘standard error’ or the ‘standard error for the mean’.
There is a relationship between the standard deviation (sd) of the population and the standard error of the sample mean (se) taking into account the sample size (n). This is given by:
se = sd / square root of n
As an example, suppose that the standard deviation of the systolic blood pressure is 19.0 mmHg, the standard error of the mean of a sample of size n = 47 is
se equals 19.0 divided by the square root of 47 equals 19.0 divided by 6.856 equals 2.77 mmHg.
When the sample size increases to n = 326, the standard error reduces to
19.0 divided by the square root of 326 equals 19.0 divided by 18.055 equals 1.05 mmHg.
The standard error is used to calculate the confidence interval of a mean, and is an important concept for the presentation of results in research articles.
using a standard normal distribution, what is the z value above which 2.5% of the population lies?
1.96
(and 2.5% of pop. lies below -1.96)
This means that 95% of the distribution lies between z = -1.96 and z = 1.96
how do you calculate a 95% confidence interval and what does this mean?
A 95% confidence interval for the true population mean is given by:
x-bar minus 1.96 multiplied by se, x-bar plus 1.96 multiplied by se
which can also be written as:
(x-bar minus 1.96 multiplied by sd over the square root of n, x-bar plus 1.96 multiplied by sd over the square root of n)
The cutoff values for the standard deviations (-1.96, and 1.96 in the above example) are obtained from the standard normal distribution. These values are used as they contain 95% of the data (or area) within a normal distribution. This confidence interval can be interpreted as follows: if this study is repeated many times, 95% of the times this interval will include the true population mean. The important role of the confidence interval is that it gives a feasible range of values within which the true population might lie.
how do you calculate degrees of freedom?
The degrees of freedom are calculated as one less than the sample size, that is n minus 1.
when would you use degrees of freedom?
when the sample size is lower than 200 (or the SD is not known), you use the degrees of freedom to find the correct t-value to use (by finding this on a given table) to calculate the 95% confidence interval
nb the t value that is used for known SD or if the sample size is above 200 is 1.96
what is the step by step process to work out any 95% confidence interval?
Step 1 Determine the sample size n and the number of degrees of freedom n minus 1
Step 2 Determine the mean (x bar) and standard deviation of the sample
Step 3 Calculate the standard error (se) (se = sd / square root of n)
Step 4 Look up the critical value t given in the table
Step 5 The 95% confidence interval is:
x-bar - (t x se) , x-bar + (t x se)
Here is an example of this using the procedure above. Let us assume that the sample size n=12, and therefore the number of degrees of freedom n - 1 = 11. The sample mean x bar = 120mmHg and sample standard deviation sd = 20mmHg.
Step 3 Calculate the standard error se calculation
Step 4 From the table (opens in a new window), get the t value t equals 2.20
Step 5 The 95% confidence interval is: confidence interval equation
which is: (120 minus 2.20 times 5.77, 120 plus 2.20 times 5.77)
which is: (107.3, 132.7) mmHg.
True or false: the 95% confidence interval is wider than the 90% confidence interval
True
If we want to be more confident (95% confident rather than 90% confident) that our range of plausible values includes the true population value then it makes sense that the confidence interval has to be wider
what is a two-sided hypothesis and why is it used in medicine?
In medical statistics it is better, and almost always the case, to take the alternative hypothesis (known as H1 or Ha) to be ‘two-sided’. This means, that if there is an effect, that effect may be beneficial or harmful. There would be two one-sided hypotheses, one would be that the effect were beneficial, and the other harmful.
As an example, consider a study in which 400 women with hypertension are prescribed a new anti-hypertensive treatment. Systolic blood pressure is measured before the treatment and then again after the course of treatment and changes in blood pressure are recorded. The null hypothesis will be that the mean change in blood pressure is zero: the treatment has no effect. The alternative hypothesis will be that the mean change is not zero: the mean change is either less than zero (the pressure has gone down and the drug is considered to be a benefit) or greater than zero (the drug is considered to be harmful). This is a two-sided hypothesis.
what does t stand for?
test statistic
what is the formula to calculate the test statistic?
t = (x-bar - mu) / se
mu, represents the appropriate value to use which is determined by the test you are conducting
mu is 0 in the null hypothesis
what is the p value? and what p value is normally used to indicate that a result is statistically significant?
The p value is the probability, under the null hypothesis, that such a test statistic could take a value at least that extreme. Thus the p value is a measure of the probability of obtaining the results of the test given that the null hypothesis is true. The smaller the p value the less likely the result is to have occurred by chance alone.
When the p value is less than 0.05, then it is conventional to say that the result is statistically significant. This figure of 0.05 has been used for the last century by the medical and scientific communities as the threshold to use. A probability of 0.05 is a 1 in 20 chance of an event happening. Formally it can be stated that if the p value is less than 0.05 then the result is statistically significant at the 5% level.
For the example of hypertension treatment with 400 patients, the value calculated for the test statistic is t=-6.06. As the t distribution is symmetric we ignore the negative sign, that is we use t=6.06. The number of degrees of freedom is 399. The value t=6.06 is larger than t=3.291 (the value of t for p=0.001 and degrees of freedom over 120) and so the corresponding p value is less than 0.001 (written as p<0.001). As this is certainly less than 0.05 the result is statistically significant.
Do we need to consider the clinical significance of a study where the results are not statistically significant?
yes, We do sometimes need to consider the clinical interpretation of non-significant statistical results. It is possible to get a non-significant statistical result when there is a real clinical difference. We are particularly concerned if the mean difference is large but the result is non-significant due to a small sample size.
(or could be people react wildly different to drug depending on, for eg, their genes, which just using the mean could mask!)
What does the p-value tell us about the data?
how likely it is that the null hypothesis is correct
what is the pearson’s coefficient? and what do the values mean?
Pearson correlation coefficient, r, is a measure of linear correlation between two numeric variables. It is a measure of how well the data fit a straight line. The value r lies between 1 and -1.
- If r > 0 we have a positive correlation; implying that if one variable increases then so does the other.
- If r < 0 we have a negative correlation; implying that if one variable increases then the other decreases.
- If r = 0 we have no correlation; implying there is no association between the two variables.
- If r= 1there is a perfect positive correlation.
- If r= -1 there is a perfect negative correlation.
• nb Correlations of less than 0.7 should be treated with suspicion.
when should pearson’s correlation coefficient not be used?
Correlation should not be used when:
- There is a non-linear relationship between variables (see Figure (a) below).
- There are outliers (see Glossary)
- There are distinct sub-groups, for example, if we mix two samples together such as healthy controls and disease cases (see Figure (b)).
- One or both of the variables is not normally distributed.
- One or both of the variables is non-numeric.