Statistics Flashcards

1
Q

Distinguish absolute and relative risk.

A

Absolute:
- Incidence
- Prevalence
- Odds
- Hazard rate/ratio

Relative:
- Risk ratio
- Hazard ratio
- Odds ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define risk.

A

The likelihood of an event occurring (and, in HACCP, the consequence of that event occurring). It is the number of outcome events / number of all events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is relative risk?

A

Relative risk/risk ratio is the ratio of 2 risks.

RR = risk in group 1 / risk in group 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is relative risk calculated?

A

(A / (A+B)) / (C / (C+D))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why can numbers be unreliable?

A
  • Small sample size
  • Some not sampled at all
  • How representative are the different samples
  • Where those that were assessed selected at random
  • Always think about the potential for bias and how that could be introduced
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the 95% confidence interval mean?

A
  • 95% of the values lie within the range of 12 and 29
  • Prevalence of obesity is 3.6% and 8.7%
  • As you increase the number of samples, the values get closer to a normal approximation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If risk is the number of outcome events over the number of all events, what happens if we do not know the total number of events?

A

Pick random individuals without disease so as not to calculate the whole population. This is when we use odds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are odds?

A

The ratio of positive outcomes to negative outcomes.

= number of outcome events / number of non-outcome events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is odds ratio?

A

The ratio of 2 odds is the odds ratio:

OR = odds in group 1 / odds in group 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is odds ratio calculated?

A

(A / B) / (C / D)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define p value.

A

The probability that the difference between the value occurred by chance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the significance of low and high p values?

A
  • Low p-value: Unlikely that the difference is due to chance alone
  • High p-value: Likely that the difference is due to chance
  • P-value for difference between breeds = 0.001 (Chi sq test)
  • Convention: a 5% (p=0.05) cut-off value for p-values is used to signify “statistical significance” (below 0.05 = statistically significant [difference])
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does a RR less than 1 mean if statistically significant?

A

Risk of disease/group is reduced.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does a RR greater than 1 mean if statistically significant?

A

Much more likely to be in that particular group, have that disease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Under what circumstances is odds ratio a good estimate of relative risk?

A

A good estimate of risk when the prevalence of disease is low or when the disease is rare.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why use risk?

A
  • More accurate reflection of population prevalence
  • Easier to interpret
  • Harder to calculate (need to be clear what the denominator us)
  • Relative risk is the measure of association calculated from cohort studies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why use odds?

A
  • Cannot estimate prevalence of disease in the population
  • Easier to calculate
  • Odds ratio if the measure of association calculated from case-control studies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How can the way we present data affect how we interpret it?

A
  • Always check axes on a graph and what the error bars actually are.
  • This is why statistical analysis is used to interpret data.
  • Is the right question being asked, for example, in the study above, is brain volume equivalent to intelligence?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is statistics in veterinary/scientific literature?

A

Refers to the process establishing the probability of samples coming from the same populations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What can graphs be used to do?

A
  • Identify the general shape of the data including the centre of its distribution (its location or average) and how variable it is (its spread)
  • Identify unusual or outlying points
  • Compare the shape of two independent datasets
  • Identify relationships between two variables in a dataset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

List different ways of displaying data.

A

Simple scatter/x-y plots
Grouped frequency table
Histogram
Dot display
Bar chart
Box and whisker plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Describe the usefulness of median and range as measures.

A

Median does not use all the data and so is not informative.

Useful for summary measures but make limited use of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Describe the usefulness of the mean as measures.

A

Unlike the median, the value of the mean is sensitive to outliers. A ‘truncated’ mean is sometimes used to avoid this, removing a predetermined number of the highest and lowest points before calculating.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is standard deviation?

A
  • The standard deviation is a measure of variation which gives an indication of the average spread of the data about the mean.
  • It is (roughly) the average of the differences between all the points and the mean.
  • The standard deviation also provides a useful tool for calculating probabilities from the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is normal distribution?

A

The set of normal distributions are represented by a family of curves (distribution functions). For each curve, the y-value (frequency) is uniquely defined by just two parameters:
- The mean
- The standard deviation (or the variance which is simply the standard deviation squared)

Many biological characteristics conform closely enough to a normal curve for it to be used to model the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is data transformation?

A
  • Sometimes continuous data are not normally distributed
  • However, they can often be ‘transformed’ to approximate normality
  • This then allows the standard, and very powerful, ‘parametric’ tools and tests that we will cover later to be used
  • With biological data a log transform will often result in a normal distribution – particularly if the data are skewed.
  • Almost any reversible transformation is acceptable to generate a normally distributed data set.
  • Understanding that transformation isn’t fiddling the data is a really important statistical step
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How do we measure disease?

A

Count the number of deaths which occur to that disease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is prevalence?

A

Prevalence = number with disease(dead) / total number. Typically expressed as a percentage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is positivity?

A

Prevalence may reflect the underlying prevalence of infection in the flock, but as we know not all infections are symptomatic. In that case we might wish to test all the animals and see what fraction are infected.

The proportion of tests which are positive is the positivity

Positivity = Tests positive / (Negative tests + Positive tests)

If we test all animals with a perfect test then positivity and prevalence of infection are the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is incidence?

A
  • Incidence measures disease occurrence over time
  • The number of new cases of a disease or condition that occur within a stated period of time as a proportion of the individuals at risk.
  • Often stated per 10,000 or 100,000 population for ease of comparison between populations.
  • Incidence rate = number of new cases / total animal time at risk
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Distinguish the information that incidence gives with the information that prevalence gives.

A

Incidence:
- New cases
- Information about risk of contracting disease

Prevalence:
- Proportion affected at point in time
- Information on how widespread the disease is
- Dependent on duration of disease
- Good when stable conditions
- Not suitable with acute disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What are denominator and numerator populations?

A

Denominator population: The total population under consideration or at risk

Numerator population: Number ‘of interest’ on top of a proportion calculation (e.g. proportion with disease = numerator (disease) / denominator (total population)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is attributable risk?

A

Difference in disease incidence or risk in exposed compared to unexposed groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Why do we do surveillance?

A
  • Rapid detection of disease outbreaks
  • Monitor importation of new/emerging diseases
  • Identification of new/emerging diseases
  • Changes in host range (for vector borne disease)
  • Monitor changes over time (incidence/prevalence)
  • Evaluation of disease control programmes
  • Assessment of the health status of a defined population
  • Define priorities for disease control and prevention
  • Provision of information to plan research
  • Food safety or zoonosis risk
  • Confirmation of absence of a specific disease
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What are the properties of an ideal surveillance system?

A
  • Pre-agreed intervention threshold
  • Agreed case definition
  • Harmonised diagnostic procedures
  • ‘Collectors’ of cases
  • Population at risk
  • Means to record cases
  • Means to report data
  • Central data collation
  • Timely and competent data analysis
  • Means to communicate findings in a timely fashion
  • Pre-defined intervention if threshold exceeded
  • Means to assess effect of intervention
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Describe active surveillance.

A
  • Purposeful collection of disease information
  • May muss ‘new’ diseases
  • Where deliberate effort is made to screen apparently healthy individuals, as well as testing ill individuals, to attempt to identify the true prevalence in a population.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Describe passive surveillance.

A
  • Makes use of existing data collection and voluntary reporting.
  • Often no defined population size
  • Often no defined unit of measure (sample vs case)
  • Liable to under-ascertainment because it misses subclinical cases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Why is passive surveillance under ascertainment?

A
  • Not all animals that get infected fall ill
  • Not all owners of ill animals get a vet to see them
  • If a vet does see an ill animal, may not take any samples, whether or not they decide to treat
  • The owner might not be willing to pay for lab tests
  • Samples may not yield a positive result, even if the animal is infected
  • If a sample is diagnosed in a laboratory, it may not be reported
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What are the advantages of passive surveillance?

A
  • Cheap
  • Can be responsive to new outbreaks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What are the disadvantages of passive surveillance?

A
  • Biased and/or patchy (for example towards symptomatic or severe cases)
  • Relies on individuals (for example, vets or farmers to report)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What are the advantages of active surveillance?

A
  • Systematic detection of cases
  • Comparable data time or area
  • Unbiased?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What are the disadvantages of active surveillance?

A
  • Expensive
  • Time-consuming
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Describe the surveillance strategy of the notifiable disease, brucellosis.

A
  • Post import checks including tests done post-calving of imported cattle
  • Regular bulk milk testing of dairy herds
  • Investigation of cattle abortions
  • Annual check blood testing of eligible herds
  • Breeding bull monitoring
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Outline the process of outbreak investigation.

A
  1. Outbreak investigation team
  2. Investigate causal agents
  3. Design test
  4. Treat/manage cases
  5. Implement controls
  6. Implement surveillance
  7. Inform national / international agencies
  8. Communications with public
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Why might we want to survey an area or population?

A

For a particular disease to evaluate, such as prevalence of disease or extent of spread of an outbreak.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

How may an area or population be surveyed or sampled?

A
  • Representativeness of sample (e.g. 100% of 1 flock, or 10% of 10 flocks)
  • Random selection (cluster-randomised)
  • Stratification (by age, sex or farm)
  • Expected/known prevalence of disease (need bigger sample for rare diseases)
  • Performance of test (Specificity & Sensitivity, PPV & NPV)
47
Q

What is a target population?

A

The set of individuals/objects that we are interested in knowing about.

For example we might want to know what proportion of UK dairy farms feed silage contaminated with mycotoxins. Our target population, then, is all dairy farms in the UK, not dairy cows or dairy farmers.

Population can refer to anything you can make an observation about.

48
Q

Why does a population need to be defined as clearly as possible?

A
  • Take a valid sample, as you can rarely look at all of the population
  • Make inferences from the sample which are valid for the target population
49
Q

Why are samples and randomisation used?

A
  • It is rarely possible to measure a whole population and so a sample is drawn which should be representative of the population.
  • One approach is to ensure that every member of the population has an equal chance of being included in the sample, in randomisation.
  • Randomisation in sampling and in study design is a foundation of experimental design.
50
Q

Describe how to take a simple random sample.

A
  1. In Excel the formula ‘=rand()’ will produce a random number between 0 and 1.
  2. Produce one random number per member of the population
  3. If the number is <0.1, or <0.01 or <0.001, you can include that member in a sample of one tenth, one hundredth or one thousandth of the population.
51
Q

What is a stratified sample?

A

When choosing a sample, avoid systematic sampling when possible (eg every 7th cat presented to the clinic):

  • But stratified random sampling is permissible in order to achieve the correct balance of subject types.
  • Stratified sampling means sampling randomly within defined strata in the dataset
52
Q

Define unbiased.

A

The parameters, mean or standard deviation, estimated from a sample will be close to the true estimates and not systematically shifted in any way. (This might occur if certain heifers are harder to handle and are excluded from the sample, the sample is no longer random.)

53
Q

Define precision.

A

A measure is repeatable, which does not mean it is accurate. (with a blood glucose level, do you need to know that the value you have is accurate or just that it’s within the normal range for that test?)

54
Q

What is the precision and bias of a random and a larger sample?

A

An estimate from a true random sample is unbiased, but a small sample may not be a very precise estimate of the population.

As the sample size is increased, the precision of the estimate is increased, although non-random sampling could still make a large sample biased.

55
Q

Why is randomisation important in clinical trials?

A
  • Patients are randomised to the treatments to ensure that no bias enters the result.
  • For example, in a study with 2 groups, 1 could flip a coin to assign each patient to 1 or other group.
  • Or you could use Excel’s rand() function and assign <0.5 or 0.5>
56
Q

What is variation between samples?

A

If a series of random samples are drawn from the same population, they will show chance/sample variation: they are imprecise but hopefully unbiased.

57
Q

How does variability between samples change as sample size increases?

A
  • The larger the sample size, the more precise/more repeatable and closer to the true population mean, the estimate are of the population parameters.
  • It follows that the larger of the sample size in a series of samples the more similar the samples will be to the population mean and to each other.
  • That is, the larger the size of the samples, the smaller will the variability in the means between a series of samples.
58
Q

What is the standard error of the mean?

A

If we drew a series of samples from the same population and calculate the mean for each:
- These means will be normally distributed around the true population mean (central limit theorem)
- This will usually be the case even if the population that is sampled is not normally distributed.
- As they are normally distributed, the variation between the means of different samples from the same population can also be described by standard deviation.
- However, because we are talking about the distribution of means, the term standard error of the mean is used instead of standard deviation.

59
Q

What is the standard error formula?

A

Standard error = standard deviation / (root of the number, n)

60
Q

Define standard deviation and standard error of the mean.

A

Standard deviation – a measure of the variation within the population. It shouldn’t change much as you increase the sample size (although it will get more precise).

Standard error of the mean – essentially a measure of how close the mean of your sample is to the true mean of the population. It will get smaller and more precise as your sample size increases.

61
Q

How can standard error and standard deviation be used to make statements about probability?

A
  • Non-esterified fatty acids in blood of ‘normal’, ‘healthy’, recently calved cows: Mean = 0.27 milli-equivalents per litre. Standard deviation = 0.13 milli-equivalents per litre
  • Roughly 95% of the population of ‘normal, healthy, recently-calved’ cows will fall within plus or minus two standard deviations from the mean, that is, between 0.01 and 0.53 milli-equivalents per litre.
  • If a cow has blood levels of 0.60 meq/L, there is less than 5% probability that she is a member of the population of ‘normal, healthy’ recently-calved cows.
  • This is usually reported as a fraction of 1 rather than of 100 and is written as P < 0.05.
62
Q

What are confidence intervals?**

A

In the same way that we can calculate probabilities for individual data points using our knowledge of the normal distribution, it is also possible to use this approach to make statements about the probability of obtaining parameter estimates, means and standard deviations.
Importantly, we can use the standard error to calculate the probabilities of obtaining a mean of a particular value.

63
Q

What different forms do diagnostic tests take?

A
  • Clinical assessment – such as palpation, reported symptoms
  • Direct measurement – such as blood pressure, temperature, respiratory rate
  • Histological – such as biopsy of a tumour
  • Tests requiring biological samples – such as urine, milk, blood, saliva
64
Q

What is sensitivity?

A

Sensitivity – the probability that an animal with the disease is identified by the test.

Sensitivity = number of positives detected that have the disease / total number with the disease

The incorrect diagnosis – the 1 animal with disease that tests negative is a false negative.

65
Q

What is specificity?

A

Specificity – the probability that an animal without a disease is tested negative by the test.

Specificity = number of negatives detected without the disease / total number without the disease

The incorrect diagnosis – the 5 animals without disease that test positive are false positive.

66
Q

How is sensitivity, specificity and accuracy from a 2x2 table?

A

Sensitivity = a / (a+c)

Specificity = d / (b+d)

Accuracy = (a+d) / (a+b+c+d), the proportion with the correct diagnosis out of all tested.

67
Q

Describe sensitivity, specificity and accuracy in a diagnostic test.

A

For a given test in optimum conditions, sensitivity, specificity and accuracy are fixed.

Development of diagnostic tests involves a trade-off between sensitivity and specificity.

68
Q

What are the other problems of diagnostic tests in real life?

A
  • Sample quality – storage and transportation
  • Contamination
  • Laboratory error – equipment and human
  • Severity or stage of disease
  • How well the test is administered
69
Q

What is a positive predictive value?

A

Positive predictive value – the proportion of animals with a positive test which have the disease

Positive predictive value = a / (a+b)

70
Q

What is a negative predictive value?

A

Negative predictive value – the proportion of animals with a negative test which do not have the disease.

Negative predictive value = d / (c+d)

71
Q

What happens to predictive values when population increases?

A

As the prevalence of disease in a population increases, the PPV increases.

Conversely, at low prevalence, even with a highly accurate test, the PPV will reduce, potentially resulting in more false positives than true positive.

72
Q

Why design a study?

A
  • Be sure about the efficacy of a treatment
  • Is X a risk factor for Y?
  • Needs to be applicable to the rest of the population
  • Very important to avoid bias
73
Q

Describe the process of coming up with a good study design.

A
  1. Good question: subjective, has had literature review on it to make sure the question hasn’t been answered before, or if any previous literature on the question has been good or needs improvement.
  2. Design study and consider ethical considerations. Most require a sign off to say it has been ethically approved.
  3. How does the study design factor in statistical power and sample size calculations?
  4. Designing sampling methods and data collection methods.
  5. Statistics – which type of statistics will you be applying. Beware of assumptions
  6. Interpretation of results
  7. Publication
74
Q

How do we ensure that the answers to the questions we are trying to answer are valid?

A
  • Absence of systematic bias in results
  • Internal validity – can extrapolate to target population
  • External validity – can extrapolate results beyond the target population
75
Q

What are the issues with study design?

A

Precision
- How accurately we can measure an effect
- Depends on inherent noise in the system
- Some noise may be able to be accounted for – sex, age, matching, etc
- Increase precision by increasing sample size

Validity – lack of systematic error or bias

76
Q

Name the 3 types of bias.

A

Selection bias
Confounding bias
Misclassification bias

77
Q

What is selection bias?

A
  • The ideal comparison group is the same subjects as the exposed has they need been exposed
  • Selection bias occurs before study begins
  • Often results from procedures to selects study subjects
78
Q

List some examples of selection bias.

A
  • Choice of comparison groups
  • Non-response bias
  • Missing data
  • Loss to follow up
  • Healthy worker effect- for example, horses that are racing are likely to be healthier than those that have no raced recently
79
Q

What is the confounding bias?

A
  • Mixing together of the effects of 2 or more factors that are related to each other and the outcome.
  • Can control in the study design
80
Q

What is misclassification bias?

A

Incorrect classification of outcome or exposure. Also known as measurement bias for continuous variables.

Examples:
- Diagnostic test with imperfect sensitivity leading to false negatives and imperfect specificity leading to false positives
- Recall bias

81
Q

What are the types of descriptive studies?

A

Case report
Case series
Survey

82
Q

What are the types of analytical study types?

A

Observational
- Cross sectional
- Case control ]cohort
- Hybrid

Experimental
- Laboratory
- Controlled trial

83
Q

What is a cross sectional study? What are its limitations?

A

Snapshot of information about exposures and disease at one time point. Can calculate prevalence, relative risk, attributable risk.

Limitations:
- Prevalence as outcome: cannot differentiate between factors associated with persistence or development of outcome.
- Exposure and outcome measured at the same time – cannot differentiate cause and effect.

84
Q

What is a cohort study?

A
  • Follow a target group/cohort for a period of time
  • Usually prospective, but with data being more often routinely collated and available, more and more retrospective cohorts.
  • For example, a cohort of calves of the same age in the same environment.
  • Compare outcomes in exposed and non-exposed individuals
85
Q

Name the typical reasons for animals to be removed from study.

A

Death
Sale
Disease other than the disease being measured
No longer at risk

86
Q

What can cohort measure?

A

Incidence rate
Relative risk
Attributable risk
Attributable fraction

87
Q

What are the advantages of cohort studies?

A
  • Can look at several diseases simultaneously
  • Obtain estimate of disease incidence
  • Temporal relationship between exposure and outcome, which is good for inferring causality
88
Q

What are the disadvantages of cohort studies?

A
  • May require a large study population
  • May take a long time
  • May cost a lot of money if prospective
89
Q

What are the advantages of case-control studies?

A
  • Can study conditions which could be considered to be rare
  • Can obtain background information quickly
90
Q

What are the disadvantages of case-control studies?

A
  • Liable to bias, particularly recall bias
  • Cannot estimate disease incidence
  • No data on population at risk
  • No estimates of absolute risk, incidence or prevalence because it samples the non-diseased animals
91
Q

Distinguish cohort and case-control studies.

A

Cohort:
- Follows exposed and unexposed animals forward through time.
- Measures if association: relative risk and odds ratio.

Case-control:
- Compares exposure history in 2 groups, 1 with disease/cases and 1 without disease/controls.
- Measure of association: odds ratio.

92
Q

What is randomised controlled/clinical trial?

A
  • Planned experiments used to evaluate therapeutic or prophylactic products on participants in their usual environment.
  • Good clinical practice – consort
  • Test whether a treatment has an effect, referred to as an intervention.
  • Population considered must all be cases
  • Allocation of participants is typically 2 groups, treated and non-treated, or random
93
Q

What are the 3 possibilities of randomised controlled/clinical disease?

A
  • Single blind - “patients” do not know whether they receive treatment or not – placebo effect
  • Double blind - operator also doesn’t know which is treatment and which is placebo –operator effect
  • Triple blind – statistician also kept in the dark – statistical test effect
94
Q

What is the null hypothesis?

A

The hypothesis that they did come from the same population (that is, that there is no difference between them) is called the null hypothesis. This is not the same as the scientific or study hypothesis.

95
Q

What is paired data?

A
  • In many cases our experiments or observations use measurements made on the same animals under different conditions.
  • This might be a group of dogs given a sedative and awareness measured before and after administration.
96
Q

What is the basis for the paired t-test?

A

Testing whether the mean difference is significantly different from zero.

97
Q

In general, where is the independent and variable variables placed?

A

In general, we tend to put the variable we consider to be the independent variable on the x-axis.

The dependant variable goes on the y-axis.

98
Q

What does regression analysis produce?

A

A line which maximises the amount of variation which is explained by the independent variable and minimises the error variation in the dependant variable.

99
Q

How can data with both categorical dependent and independent variables be analysed?

A

One way is an even more complicated version of the general linear model, the generalised linear model.

The second is the chi-squared (or c 2 test)

100
Q

What is the null hypothesis?

A

Statistical significance is simply a measure of whether we reject or accept the null hypothesis

  • If we reject it, we imply that there is a real, mechanistic link between two or more measured variables
  • However big the difference, it’s important for our understanding of mechanisms of disease
  • But the difference may not be big enough to justify treatment
101
Q

What is the EFSA definition of a biologically relevant effect?

A

An effect considered by expert judgement as important and meaningful for human, animal, plant or environmental health.

It therefore implies a change that may alter how decisions affecting a specific problem are taken.

102
Q

What is a confounding variable/factor?

A

A variable that influences the dependent and independent variables, causing a spurious association.

103
Q

Use the example of coffee, cancer and smoking to explain confounding.

A

Coffee drinkers are also more likely to smoke, it is not the coffee causing the cancer, it is the smoking.

104
Q

What is a causal web and how are they constructed?

A

A causal web is a diagram that is useful for quantity confounding when estimating the effect of an exposure on an outcome:

  1. List all the possible covariates (characteristics)
  2. Identify relationship
105
Q

List the study types according to increased likelihood to be biased by confounding.

A
  1. Systematic reviews
  2. Randomised cotrolled trials
  3. Non-randomised controlled trials
  4. Observational studies with comparison groups
  5. Case series and case reports
  6. Expert opinion
106
Q

Distinguish observational and experimental studies.

A

Observational, as participants were not asked to change their behaviour so we are simply observing what is happening.

Experimental – involved in altering behaviours.

107
Q

What is a cross-sectional study?

A

This is a snapshot at one point in time and participants were not selected based on outcome.

108
Q

What outcome measure is used for case control studies?

A

Relative risk cannot be calculated because we do not know the denominator. Instead an odds ratio is estimated.

109
Q

Why may case-controlled studies be used?

A

A case-control study is ideal for a rare disease because you will include cases by design.

110
Q

Which study types are most susceptible to selection bias?

A
  • Healthy worker bias
  • Loss to follow up
  • Volunteer bias
  • Use of prevalent cases instead of incidence cases
111
Q

Which study types are most susceptible to information bias?

A

Measurement error
Misclassification bias
Recall bias

112
Q

Which study type is most susceptible to confounding bias?

A

Observed result between exposure and outcome is influenced by a third variable

113
Q

In normality tests, what does a p value less than 0.05 mean?

A

Abnormality

114
Q

Why may normality be hard to detect?

A

Few observations