Stats Flashcards

1
Q

Explain the two categories of data

A

Categoric:
Nominal, binary and ordinal

Numeric:
Continuous and discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give 3 ways of describing categoric data

A

Example scenario:
Total participants 676, those using drug and have MI = 31, those using the drug and no MI = 310
Those using placebo and have MI =61 , those using placebo and no MI = 305
Risk for MI with drugs = 31/341 =0.091 (9.1% -just x 100 to get percentage)
Risk for MI with placebo = 61/366 = 0.167 (16.7% to get percentage)
Odds for MI with drugs = 31/310 = 0.1
Odds for MI with placebo = 61/305 = 0.2

(Absolute) risk difference = 0.167 - 0.091 = 0.076 (7.6%) this means the risk with a placebo is 7.6% higher than the drug

Usually just called risk difference but absolute is added when we do not worry about the minus sign

(Relative) Risk ratio = 0.167/0.091 = 1.835 (185%) (placebo on top so becomes focus group), this means the risk is increased by 85% with the placebo compared to the drug, 0.091/0.167 = 0.545 (54.5%) (drug on top so becomes focus group), this means the risk is decreased by 45.5% with the drug than with the placebo

Usually referred to as risk ratio RRR, but actually called relative risk ratio

Odd ratio = 0.2/0.1 = 2 (with the placebo as the focus group) this means that it has increased by 1, there is an 100% increase in the odds of having an MI on the placebo compared to drug A
Odd ration = 0.1/0.2 = 0.5 (with drug A as the focus group) this means that it has decreased by 0.5, so there is a 50% decrease in odds of an MI on drug A compared to the placebo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to find the relative risk of the other focus group e.g think of drug A and placebo scenario

A

If the risk ratio with the placebo as the focus group is = 1.835
To find the risk ratio with drug A as the focus group, we do 1/1.835 which gives 0.545.
Basically finding the reciprocal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give two ways to measure and present categorical data

A

Pie charts

Bar charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give three ways to measure and present numerical (quantitative) data

A

Dot plots
Histograms
Box and whisker plots (box plots)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Give a way to present an association between two continuous variables

A

Scatter plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Give some characteristics of a histogram

A

It can be used to show normal distribution (also called Gaussian distribution)

Can show skewed data which is when data is not symmetrical
Negative skewed data = has long low left tail and peaks at high values on the right
Positively skewed data = has long low right tail and peaks at low values on the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Give some characteristics of a box plot

A

The box contains the middle 50% of the data
The line in the box plot shows the median value
Outliers (which are values 1.5 box length from the upper and lower edge of the box) are plotted as dots outside of the whiskers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Give some characteristics of scatter plots

A

The independent variable is on the X axis and is usually what the experimenter changes

The dependent variable is on the Y axis and is usually the response to what the experimenter changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Giv three ways that measure the spread of data

A

Range
Inter-quartile range
Standard variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the variance

A

Variance = standard deviation ^2 (squared)

Standard variation = square root of variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does a large or small standard deviation show

A

A small standard deviation shows that any random value picked is likely to be close to the mean so small spread of data

A large standard deviation shows that any random value picked is likely to further from the mean so large spread of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the best method to use if there is a symmetric distribution of data

A

Mean and the standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the best method to use if the distribution of data is non-symmetric

A

Median and the interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Methods to summarise categoric variables

A

Proportion, percentage, risk and odds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Methods to summarise numerical (quantitative) data

A

Mean, median, range, interquartile range and standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Methods to quantify differences between two categorical variables

A

Absolute risk difference
Relative risk ratio
Odds ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Methods to quantify the differences between two numeric variables

A
Persons correlation coefficient (r)
r must be between 1 and -1
\+1 shows a positive linear correlation
0 shows no linear correlation
-1 shows a negative linear correlation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Methods to calculate the difference between one categoric variable and one numeric variable

A

If both variables give symmetrical graphs (distribution of data), use mean - mean =

If one of the variables give a non-symmetrical graph (distribution of data), use median - median =

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Give the percentage of 1 standard devation, 2 standard deviations and 3 standard deviations.

A

1 standard deviation = 68%
2 standard deviation = 95.4%
3 standard deviation = 99.8%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which standard deviation give 95% of the distribution in a graph

A

1.96 standard deviation give 95% of the distribution of the graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

If you cannot use a mean for a certain set of data, would you still be able to use standard deviation for that data

A

No, the standard deviation would be affected by the same issue of being skewed by outliers

All if possible it is always best to use the mean and standard deviation because they include all values in the data so more powerful

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Who is Sir Galton Francis and what are his contributions to statistics

A

1822-1911

Standard deviation, correlation, concepts of regression, medians and ranking

First weather map
How to cut a cake
Attractiveness of cities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is standard error

A

It is an estimate of the precision of the representation of the sample to the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
How is the standard error calculated
Standard error = standard deviation/ the square root of the sample size
26
When can the standard error not be used
When the standard deviation and the mean cannot be used due to how skewed the data is
27
What are the rules to use the standard error
The data has to be normally distributed | The sample size has to be large enough (more than 20 individuals)
28
Show how to calculate the confidence interval from the standard error
``` If the Mean = 18,477 sample size (n) = 12 standard deviation (SD) = 3,732 ``` Standard error = standard deviation/ the square root of the sample size Standard error = 1077.3 To get a 95% confidence interval, use the 1.96 from the standard deviation To get a 99% confidence interval, use the 2.58 from the standard deviation Mean - (1.96 x standard error) = 18,477 - (1.96 x 1077.3) = 16,365 Mean + (1.96 x standard error) = 18,477 + (1.96 x 1077.3) = 20,589 95% confident that the true value of the mean lies between 16,365 and 20,589 If we wanted to get the 99% confidence, we would do mean - (2.58 x standard error) and mean + (2.58 x standard error)
29
What does a small standard error mean for a sample
Greater precision that the results from the sample are representative of the populations
30
What does a small standard deviation mean
The values are less spread so there is less variability in the sample
31
What is correlation used to explore
- how two numeric continuous variable are related | - the strength of an association
32
Give the regression equation and what part of it means
Y= a + bx Y is the dependent variable (or called outcome or response) the one we measure e.g blood pressure reading, pain score, hours of sleep X is the independent variable (or called predictor or explanatory) e.g age, deprivation level and family history of illness) A is the y intercept (or called the constant) B is the coefficient - the change in y when we increase x by 1 unit
33
Give the name of the type of regression where the outcome is a single continuous variable e.g sleep time
Linear regression
34
Give the name of a regression which has a binary outcome e.g pass or fail
Logistic regression
35
What is the benefit of regressions compared to just correlations
We can incorporate additional values (additional predictors) which allows us to account for confounding variables
36
Give some characteristics of regression models
- they can be used to create productive models - remove the effects of confounding variables - explore how a particular drug influences the outcome
37
Does linear regression require a binary or numeric outcome variable
It requires a numeric outcome variable
38
A regression model with more than one predictor is called
A multivariable model
39
What is a great benefit of a multivariable regression model (more than one predictor)
It adjusts for confounding variables
40
A difference between coefficients in a single and multiple regression model is
Coefficients in a multiple model have taken account of background factors
41
In linear regression with 1 predictor variable, a coefficient of 1.5 means that increasing the predictor by 1 unit causes
An increase in the outcome by 1.5 units
42
What is prevalence
Proportion of population with a disease at one point in time Number of cases at a point in time/ total population = prevalence
43
What is incidence
A rate Rate at which new cases appear in a population at a certain period of time Number of new cases/ at risk population = incidence
44
Advantages of ecological studies
-uses routinely collected data - Quick, cheap -units of analysis are populations - groups of people -can examine patterns of ill-health by age, sex, ethnicity, country and/or by time -few ethical issues -useful for generating hypotheses
45
Disadvantages of ecological studies
- no link between individual exposure and effect - bias - variation in diagnostic criteria -absence of records of individual attributes -unsuitable format of records -inconsistency in data presentation
46
Advantages of cross-sectional studies
- results used to generate hypotheses -rapid feedback of current events in the community - quick and cheap -few ethical problems
47
Disadvantages of cross-sectional studies
- could just be reporting a medical oddity -prone to bias, e.g. sampling, subject and observer variation - no time reference
48
Advantages of case control studies
-by concentrating effort on the identification of affected individuals and recruiting controls from the unaffected population, the number of subjects required to obtain significant results is kept to a minimum (so good for rare diseases) -results can be obtained relatively quickly because the investigation does not have to wait for the disease to develop (compare this with cohort studies – see later) and can look for multiple causes -it is a relatively inexpensive type of study
49
Disadvantages of case control studies
-generally rely on retrospective data, which has its own dangers. The ability of individuals to recall past events tends to be unreliable due to a tendency for memory to be selective. Records of past events may be incomplete. -because data are collected retrospectively, it is difficult to say if an association is causal or not. This is less of a problem when the exposure is highly specific or where the time between exposure and disease is short -prone to selection and information biases -there can be difficulties choosing controls -the incidence of disease within a population cannot be calculated from this type of study
50
Advantages of cohort study
-the main advantage is that it is possible to distinguish antecedent causes from concurrent associated factors (cause comes before effect) -since incidence can be determined for both exposed and non- exposed groups, we can determine absolute, relative and attributable risks -we can study more than one outcome to the same exposure -there is less chance of bias since exposure is measured before development of disease
51
Disadvantages of cohort study
-cannot be certain that exposures are causal- this requires controlled studies -long periods of study, and large populations mean that cohort studies are expensive -follow-up can be a problem- especially if the period of study is long- this needs to be considered in the design of the study -diagnosis of cases may change over the years as medical science becomes more advanced- better at detecting the disease or with different criteria for a diagnosis
52
Advantages of randomised control trials
-randomization should mean that confounding factors (age, sex etc.) are equally distributed. This helps to concentrate the study on the effect of the intervention -by randomly allocating patients to interventions, it is likely that staff and patients will not break the blinding -statistical tests for significance are easier to interpret when the study design removes confounders -confounders and many biases minimised
53
Disadvantages of randomised control trials
-to allow sufficient numbers to balance confounders these tend to be large and expensive trials. They are often multicentre and may even be multinational -there is always a chance that volunteer bias will be a problem: what about people that refuse to be included in the trial or those that are never asked. -there may be ethical difficulties in withholding treatment from the control group or offering what is believed to be an inferior treatment to one group -may lose statistical power if poor compliance
54
What is critical appraisal
-critical appraisal is the assessment of evidence by systematically reviewing its relevance, validity and results to specific situations -by R Chambers 1998
55
Difference between parametric and non parametric analysis tests
Parametric tests have rules that need to be followed or assumptions that need to be met Non-parametric tests are used as an alternative - they dont need rules to be followed or assumptions to be met Assumptions include - sample size, normal distribution and linearity in regression
56
Examples of parametric test
- one sample t-tests - two sample t-tests (also called students t-test) - chi square test - ANOVA test - Pearson correlation coefficient
57
Examples of non-parametric test
- one sample Wilcoxon test - Mann-Whitney U test - Fishers Exact test - Kruskal-Wallis ANOVA - Spearman rank correlation coefficient
58
Give two examples of critical appraisal tools
- CASP | - AXIS has 20 questions and no scoring system
59
Give the frequency of these single gene defects | Cystic fibrosis, alpha-1-antitrypsin deficiency, Hereditary Haemorrhagic Telengretasia (HHT), Immotile cilia syndrome
Cystic fibrosis = 1 in 2500 alpha-1-antitrypsin deficiency 1 in 2000 Hereditary Haemorrhagic Telengretasia (HHT) 1 in 4000 Immotile cilia syndrome 1 in 20000
60
How can the CFTR gene (chromosome 7, 27 exons and 1480 residue proteins) be identified
- linkage - positional cloning - sequencing
61
How can Cystic Fibrosis be diagnosed
- sweat test | - gene mutation analysis
62
Give the symptoms that Cystic Fibrosis could cause
- abnormal ion transport across epithelium - salt loss - impaired mucociliary clearance - chronic infections - sterility (infertility) - impaired digestion (meconium ileus) - failure to thrive - liver disease - diabetes
63
Treatment of Cystic Fibrosis
- pancreatic enzyme supplementation - control of infection - suppression of chronic infection - antibiotic nebulisers - bronchodilation - salbutamol nebulisers - anti-inflammatory - azithromycin - diabetes - insulin - vaccinations - flu, pneumococcal
64
Give chromosomal cause of alpha-1-antitrypsin
- autosomal recessive - chromosome 14 - 14q32.1
65
What is the normal phenotype for alpha-1-antitrypsin deficiency and the disease phenotype
M is the normal phenotype | S and Z are associated with major disease presentation
66
What are the clinical presentation of alpha-1-antitrypsin deficiency
Due to build up of deformed alpha-1-antitrypsin in the liver - childhood jaundice - early onset cirrhosis Due to the unopposed action of neutrophil elastase in the lungs -early onset emphysema and bronchietasis Highly sensitive to cigarette smoke
67
What is the inheritance pattern of hereditary haemorrhagic talengiectasia (HHT)
Hereditary haemorrhagic talengiectasis (HHT) is also known as Osler-Weber-Rendu diseases (or syndrome) -causes abnormal blood vessel formation in the skin, mucous membranes and in the organs such as the lungs, liver and brain
68
Give the loci affected by the 3 forms of Osler-Weber-Rendu Syndrome and symptoms experienced in these conditions
HHT1 -endoglin gene (ENG) on chromosome 9 HHT2 -ALK-1 HHT3 -chromosome 5 - talengectasia - epitaxis - PAVMs - GI blood loss
69
Give another name for immotile cilia syndrome and its inheritance patterns
- Kartagner's syndrome or primary ciliary dyskinesia | - autosomal recessive
70
Give some symptoms of Kartagner's syndrome
10 variations in dynein arm - infertility - sinusitis - bronchiectasis - situs invertus
71
Give some examples of disease from polygenic influences
- asthma - chronic obstructive pulmonary disease (COPD) - venous thrombosis and pulmonary embolism - Tuberculosis - sarcoidosis (NRAMP) - Obstructive sleep apnoea - infant respiratory distress
72
Give 4 examples of autosomal recessive respiratory disease and their genes
- cystic fibrosis - CFTR - alpha-1-antitrypsin - SERPINEA1 - kartagener's syndrome (immotile cilia) - DNA1 - pulmonary veno-occlusive disease - E1FZAK4
73
Give an x linked example of a respiratory condition and their genes
-chronic granulomatosis disease CYBB
74
Give 2 examples of autosomal dominant conditions and their genes
- hereditary haemorrhagic telangectasis (HHT) - ALK, ENG | - hereditary pulmonary arterial hypertension (HPAH) - BMPR2
75
What is secondary prevention
-aims to detect early disease in order to alter the course of the disease e.g screening by mammography for breast cancer in order to treat it early
76
What is sensitivity and give the formula to calculate it
-the proportion of people with the disease who are correctly identified by the screening test True positive/ true positive + false negative = sensitivity
77
What is specificity
-the proportion of people without the disease are correctly excluded by the screening test True negative/ true negative + false positive = specificity
78
What is positive predictive value and give the formula to calculate it
-the proportion of people with a positive test result who actually have the disease True positive/ true positive + false positive = positive predictive value
79
What is a negative predictive value and give the formula to calculate it
-the proportion of people with a negative test result who do not have the disease True negative/false negative + true negative = negative predictive value
80
Give the formula to calculate prevalence
True positive + false negative = true positive + false negative + false positive + true negative
81
Which of these have an effect the predictive values - prevalence - specificity - sensitivity
- predictive values are dependent on prevalence | - sensitivity and specificity do not affect predictive values
82
How would screening programs be evaluated
-by randomised controlled trial (individual or clusters)
83
Give 3 forms of bias that can affect evaluation of screening programs
- selection bias - lead time bias - length time bias (or length bias)
84
What is selection bias
-people who chose to participate in screening programmes may be different from those who do not - may be at more risk - may be at less risk
85
What is lead time bias
When screening appears to increase survival time because disease was discovered and diagnosed earlier
86
What is length time bias
An overestimation of survival because long duration cases are more likely to be detected and treated than short duration cases e.g PSA screening more likely to be detected as the tumour is slow growing
87
What are the 5 types of screenings
- population-based screening programs (national diabetes and hypertension screening like in thailand) - opportunistic screenings (prevention and control of substance abuse) - screening for communicable diseases (heaf test) - pre-employment and occupational medicals (vision test for commercial drivers) - commercially provided screening (screening is a programme not a test)