RESS ebook Flashcards

Question

A Pearson correlation coefficient should be only calculated between two normally distributed variables. what do you use when pearson's cannot be used?

Answer 1

the Spearman rank correlation coefficient, rho. This correlation coefficient can be used when the data is not normally distributed, when one or both of the variables are ordinal, or when the sample size is small. it gives the same values as pearson's

Answer 2

• The result occurred by chance • A influences (or ‘causes’) B • B influences (or 'causes') A. Not the same thing as A causes B, but the statistical measure of correlation will be the same • A and B are influenced by some other variable(s), C. This can happen in two ways: ------ 1) C may ‘cause’ both A and B. For example, an increased consumption of sugar increases the number of caries a person has and increases their weight. Does more weight cause more caries? Probably not, but weight will be correlated with caries. ------ 2) A may lead to an increase in C which ‘causes’ B e.g. low income may increase chance of smoking which increases chance of death from lung cancer. Does low income cause lung-cancer?

Answer 3

When we wish to examine the association between two categorical variables, we can create a table, such as the one shown below, known as a 'contingency table'. We can perform a hypothesis test on a table of one or two categorical variables. The test that is used most often is a chi-squared test. The null hypothesis of the chi-squared test is that there is no association between the two variables. The test works by comparing the contingency table we observe from our results with the one we would expect if the null hypothesis were true. ``` Caries No Yes Total Fluoridated 77 29 106 Non-fluoridated 95 31 126 Total 172 60 232 ``` This table shows a simple contingency table, which has two categorical variables, and the chi-squared test is the most appropriate way of testing the null hypothesis. You will not be expected to be able to calculate a chi-squared test in an exam, but we wish you to be aware of being able to use and interpret chi-squared tests. Using Stata the chi-squared value =0.228 (with 1 degree of freedom), p=0.63. As p> 0.05 we cannot reject the null hypothesis of no association, which means that this data suggests that Caries are not associated with fluoridated water. Conditions for the chi-squared test The number of expected values in each of the four cells should be greater than 1. And in three of the four cells the expected value should be greater than 5. Continuity correction (Yates’s correction) For small sample sizes the chi-squared test is too likely to reject the null hypothesis. A continuity correction can be made to allow for this. Although it is only strictly necessary on small sample sizes I would recommend always using it. The two conditions above still have to be met. Fisher’s exact test If a contingency table fails to meet the conditions required for the chi-squared test then Fisher’s exact test can be used. This is based on different mathematics to the chi-squared test, and is more robust when sample sizes are small. However, we will not be investigating this further. You should be aware of the limitations of the chi-squared test and that there are methods to overcome these shortcomings.

Answer 4

continuous variables A Pearson correlation coefficient is about measuring a linear association between two continuous variables. It should not be used if you have categorical variables (nominal or ordinal)

Answer 5

primary = preventing disease occuring in currently unaffected individuals (eg vaccination) secondary = preventing clinical symptoms of disease occurring when disease if currently asymptomatic (eg cervical smears) tertiary = preventing relapse/controlling of symptoms of a chronic disease to increase functionality (eg stroke rehabilitation)

Answer 6

positive predictive value: - If a person tests positive, what is the probability that he or she has the condition no. true positives/ (no. true positives + no. false positives)

Answer 7

negative predictive value: - If a person tests negative, what is the probability that he or she does not have the condition no. true negatives/ (no. true negatives + no. false negatives)

Answer 8

Right censored data occurs when the people in the study did not reach a failure before the end of the study. For example, in a study looking at a new drug to treat HIV, right censored data will occur if the study participants die of other non-AIDS causes, if by the end of the study some participants have not developed AIDS, or if some have left the study (e.g. by leaving the country). Left censoring is when we are not certain what happened to people before the time at which they entered the study. A common example is when people already have the disease of interest when the study starts.

Answer 9

The time the person leaves the study is known. Leaving the study may be due to the event happening (such as death in a survival analysis), the person may ask to leave the study, or the study may lose track of a person. The event is known as a 'failure'

Answer 10

survival function - the chance of survival until a certain time hazard function - the chance of instantaneous failure at any one time

Answer 11

log-rank test Kaplan-Meier plot

Answer 12

Cohort studies Cohort studies begin with a group of people (a cohort) free of disease. The people in the cohort are grouped by whether or not they are exposed to a potential cause of disease. The whole cohort is followed over time to see if the development of new cases of the disease (or other outcome) differs between the groups with and without exposure. For example, you could do a cohort study if you suspect there might be a causal relationship between the use of a certain water source and the incidence of diarrhea among children under five in a village with different water sources. You select a group of children under five years, either all children of that age in the village, a random sample taken from the population register, or e.g. children living in the same area, or attending the same clinic. Then you classify them as either using the suspected water source or other water sources. You check e.g. after two weeks whether the children have had diarrhea. You can then calculate how many diarrhea cases there were among those children using the suspected water source and those using other sources of water supply (cumulative incidence of diarrhea). How to compare the cumulative incidence rates of the two groups, in order to conclude whether the suspected water source is a risk factor for the disease or not, will be discussed in a future blog. Case-control studies The same problem could also be studied in a case-control study. A case-control study begins with the selection of cases (people with a disease) and controls (people without the disease). The controls should represent people who would have been study cases if they had developed the disease (population at risk). The exposure status to a potential cause of disease is determined for both cases and controls. Then the occurrence of the possible cause of the disease could be calculated for both the cases and controls. To come back to the example, you may compare children who present themselves at a health center with diarrhea (cases) with children with other complaints, for example acute respiratory infections (controls). You determine which source of drinking water they had used. Then calculate the proportion of cases and controls that were exposed to the suspected water source. Pro’s and con’s On what basis do you decide to choose a cohort design or a case-control design? Cohort studies provide the best information about the causation of disease, because you follow persons from exposure to the occurrence of the disease. With data from cohort studies you can calculate cumulative incidences, which are the most direct measurement of the risk of developing disease. An added advantage is that you can examine a range of outcomes/diseases caused by one exposure (e.g. heart disease, lung disease, renal disease caused by smoking). However, cohort studies are major undertakings. They may require long periods of follow-up since disease may occur a long time after exposure. Therefore, it is a very expensive study design. Cohort studies work well for rare exposures–you can specifically select people exposed to a certain factor. But this design does not work for rare diseases–you would then need a large study group to find sufficient disease cases. Case-control studies are relatively simple to conduct. They do not require a long follow-up period (as the disease has already developed), and are hence much cheaper. This design is especially useful for rare diseases (as you select the cases yourself), but not for rare causes (as you will probably not find these in sufficient number in your study). It is also very suitable for diseases with a long latent period, such as cancer. However, case-control studies are less adept at showing a causal relationship than cohort studies. They are more prone to bias. One example is recall bias: cases might recall certain exposures more clearly than controls, simply due to the fact that they have thought about what could have caused their disease.