Stats Flashcards

1
Q

Explain the two categories of data

A

Categoric:
Nominal, binary and ordinal

Numeric:
Continuous and discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give 3 ways of describing categoric data

A

Example scenario:
Total participants 676, those using drug and have MI = 31, those using the drug and no MI = 310
Those using placebo and have MI =61 , those using placebo and no MI = 305
Risk for MI with drugs = 31/341 =0.091 (9.1% -just x 100 to get percentage)
Risk for MI with placebo = 61/366 = 0.167 (16.7% to get percentage)
Odds for MI with drugs = 31/310 = 0.1
Odds for MI with placebo = 61/305 = 0.2

(Absolute) risk difference = 0.167 - 0.091 = 0.076 (7.6%) this means the risk with a placebo is 7.6% higher than the drug

Usually just called risk difference but absolute is added when we do not worry about the minus sign

(Relative) Risk ratio = 0.167/0.091 = 1.835 (185%) (placebo on top so becomes focus group), this means the risk is increased by 85% with the placebo compared to the drug, 0.091/0.167 = 0.545 (54.5%) (drug on top so becomes focus group), this means the risk is decreased by 45.5% with the drug than with the placebo

Usually referred to as risk ratio RRR, but actually called relative risk ratio

Odd ratio = 0.2/0.1 = 2 (with the placebo as the focus group) this means that it has increased by 1, there is an 100% increase in the odds of having an MI on the placebo compared to drug A
Odd ration = 0.1/0.2 = 0.5 (with drug A as the focus group) this means that it has decreased by 0.5, so there is a 50% decrease in odds of an MI on drug A compared to the placebo

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to find the relative risk of the other focus group e.g think of drug A and placebo scenario

A

If the risk ratio with the placebo as the focus group is = 1.835
To find the risk ratio with drug A as the focus group, we do 1/1.835 which gives 0.545.
Basically finding the reciprocal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give two ways to measure and present categorical data

A

Pie charts

Bar charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give three ways to measure and present numerical (quantitative) data

A

Dot plots
Histograms
Box and whisker plots (box plots)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Give a way to present an association between two continuous variables

A

Scatter plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Give some characteristics of a histogram

A

It can be used to show normal distribution (also called Gaussian distribution)

Can show skewed data which is when data is not symmetrical
Negative skewed data = has long low left tail and peaks at high values on the right
Positively skewed data = has long low right tail and peaks at low values on the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Give some characteristics of a box plot

A

The box contains the middle 50% of the data
The line in the box plot shows the median value
Outliers (which are values 1.5 box length from the upper and lower edge of the box) are plotted as dots outside of the whiskers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Give some characteristics of scatter plots

A

The independent variable is on the X axis and is usually what the experimenter changes

The dependent variable is on the Y axis and is usually the response to what the experimenter changes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Giv three ways that measure the spread of data

A

Range
Inter-quartile range
Standard variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the variance

A

Variance = standard deviation ^2 (squared)

Standard variation = square root of variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does a large or small standard deviation show

A

A small standard deviation shows that any random value picked is likely to be close to the mean so small spread of data

A large standard deviation shows that any random value picked is likely to further from the mean so large spread of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the best method to use if there is a symmetric distribution of data

A

Mean and the standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the best method to use if the distribution of data is non-symmetric

A

Median and the interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Methods to summarise categoric variables

A

Proportion, percentage, risk and odds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Methods to summarise numerical (quantitative) data

A

Mean, median, range, interquartile range and standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Methods to quantify differences between two categorical variables

A

Absolute risk difference
Relative risk ratio
Odds ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Methods to quantify the differences between two numeric variables

A
Persons correlation coefficient (r)
r must be between 1 and -1
\+1 shows a positive linear correlation
0 shows no linear correlation
-1 shows a negative linear correlation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Methods to calculate the difference between one categoric variable and one numeric variable

A

If both variables give symmetrical graphs (distribution of data), use mean - mean =

If one of the variables give a non-symmetrical graph (distribution of data), use median - median =

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Give the percentage of 1 standard devation, 2 standard deviations and 3 standard deviations.

A

1 standard deviation = 68%
2 standard deviation = 95.4%
3 standard deviation = 99.8%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which standard deviation give 95% of the distribution in a graph

A

1.96 standard deviation give 95% of the distribution of the graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

If you cannot use a mean for a certain set of data, would you still be able to use standard deviation for that data

A

No, the standard deviation would be affected by the same issue of being skewed by outliers

All if possible it is always best to use the mean and standard deviation because they include all values in the data so more powerful

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Who is Sir Galton Francis and what are his contributions to statistics

A

1822-1911

Standard deviation, correlation, concepts of regression, medians and ranking

First weather map
How to cut a cake
Attractiveness of cities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is standard error

A

It is an estimate of the precision of the representation of the sample to the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How is the standard error calculated

A

Standard error = standard deviation/ the square root of the sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

When can the standard error not be used

A

When the standard deviation and the mean cannot be used due to how skewed the data is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are the rules to use the standard error

A

The data has to be normally distributed

The sample size has to be large enough (more than 20 individuals)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Show how to calculate the confidence interval from the standard error

A
If the 
Mean = 18,477 
sample size (n) = 12 
standard deviation (SD) = 3,732

Standard error = standard deviation/ the square root of the sample size
Standard error = 1077.3

To get a 95% confidence interval, use the 1.96 from the standard deviation
To get a 99% confidence interval, use the 2.58 from the standard deviation

Mean - (1.96 x standard error) = 18,477 - (1.96 x 1077.3) = 16,365
Mean + (1.96 x standard error) = 18,477 + (1.96 x 1077.3) = 20,589

95% confident that the true value of the mean lies between 16,365 and 20,589

If we wanted to get the 99% confidence, we would do mean - (2.58 x standard error) and mean + (2.58 x standard error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What does a small standard error mean for a sample

A

Greater precision that the results from the sample are representative of the populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does a small standard deviation mean

A

The values are less spread so there is less variability in the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is correlation used to explore

A
  • how two numeric continuous variable are related

- the strength of an association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Give the regression equation and what part of it means

A

Y= a + bx

Y is the dependent variable (or called outcome or response) the one we measure e.g blood pressure reading, pain score, hours of sleep

X is the independent variable (or called predictor or explanatory) e.g age, deprivation level and family history of illness)

A is the y intercept (or called the constant)

B is the coefficient - the change in y when we increase x by 1 unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Give the name of the type of regression where the outcome is a single continuous variable e.g sleep time

A

Linear regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Give the name of a regression which has a binary outcome e.g pass or fail

A

Logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is the benefit of regressions compared to just correlations

A

We can incorporate additional values (additional predictors) which allows us to account for confounding variables

36
Q

Give some characteristics of regression models

A
  • they can be used to create productive models
  • remove the effects of confounding variables
  • explore how a particular drug influences the outcome
37
Q

Does linear regression require a binary or numeric outcome variable

A

It requires a numeric outcome variable

38
Q

A regression model with more than one predictor is called

A

A multivariable model

39
Q

What is a great benefit of a multivariable regression model (more than one predictor)

A

It adjusts for confounding variables

40
Q

A difference between coefficients in a single and multiple regression model is

A

Coefficients in a multiple model have taken account of background factors

41
Q

In linear regression with 1 predictor variable, a coefficient of 1.5 means that increasing the predictor by 1 unit causes

A

An increase in the outcome by 1.5 units

42
Q

What is prevalence

A

Proportion of population with a disease at one point in time

Number of cases at a point in time/ total population = prevalence

43
Q

What is incidence

A

A rate

Rate at which new cases appear in a population at a certain period of time

Number of new cases/ at risk population = incidence

44
Q

Advantages of ecological studies

A

-uses routinely collected data - Quick, cheap
-units of analysis are populations - groups of people
-can examine patterns of ill-health by age, sex, ethnicity, country
and/or by time
-few ethical issues
-useful for generating hypotheses

45
Q

Disadvantages of ecological studies

A
  • no link between individual exposure and effect
  • bias - variation in diagnostic criteria
    -absence of records of individual attributes
    -unsuitable format of records
    -inconsistency in data presentation
46
Q

Advantages of cross-sectional studies

A
  • results used to generate hypotheses
    -rapid feedback of current events in the community
  • quick and cheap
    -few ethical problems
47
Q

Disadvantages of cross-sectional studies

A
  • could just be reporting a medical oddity
    -prone to bias, e.g. sampling, subject and observer variation
  • no time reference
48
Q

Advantages of case control studies

A

-by concentrating effort on the identification of affected individuals and recruiting controls from the unaffected population, the number of subjects required to obtain significant results is kept to a minimum (so good for rare diseases)
-results can be obtained relatively quickly because the investigation does not have to wait for the disease to develop (compare this with cohort studies – see later) and can look for multiple causes
-it is a relatively inexpensive type of study

49
Q

Disadvantages of case control studies

A

-generally rely on retrospective data, which has its own dangers. The ability of individuals to recall past events tends to be unreliable due to a tendency for memory to be selective. Records of past events may be incomplete.
-because data are collected retrospectively, it is difficult to say if an association is causal or not. This is less of a problem when the exposure is highly specific or where the time between exposure and disease is short
-prone to selection and information biases
-there can be difficulties choosing controls
-the incidence of disease within a population cannot be calculated from this type of study

50
Q

Advantages of cohort study

A

-the main advantage is that it is possible to distinguish antecedent causes from concurrent associated factors (cause comes before effect)
-since incidence can be determined for both exposed and non- exposed groups, we can determine absolute, relative and attributable risks
-we can study more than one outcome to the same exposure
-there is less chance of bias since exposure is measured before development of disease

51
Q

Disadvantages of cohort study

A

-cannot be certain that exposures are causal- this requires controlled studies
-long periods of study, and large populations mean that cohort studies are expensive
-follow-up can be a problem- especially if the period of study is long- this needs to be considered in the design of the study
-diagnosis of cases may change over the years as medical science becomes more advanced- better at detecting the disease or with different criteria for a diagnosis

52
Q

Advantages of randomised control trials

A

-randomization should mean that confounding factors (age, sex etc.) are equally distributed. This helps to concentrate the study on the effect of the intervention
-by randomly allocating patients to interventions, it is likely that staff and patients will not break the blinding
-statistical tests for significance are easier to interpret when the study design removes confounders
-confounders and many biases minimised

53
Q

Disadvantages of randomised control trials

A

-to allow sufficient numbers to balance confounders these tend to be large and expensive trials. They are often multicentre and may even be multinational
-there is always a chance that volunteer bias will be a problem: what about people that refuse to be included in the trial or those that are never asked.
-there may be ethical difficulties in withholding treatment from the control group or offering what is believed to be an inferior treatment to one group
-may lose statistical power if poor compliance

54
Q

What is critical appraisal

A

-critical appraisal is the assessment of evidence by systematically reviewing its relevance, validity and results to specific situations -by R Chambers 1998

55
Q

Difference between parametric and non parametric analysis tests

A

Parametric tests have rules that need to be followed or assumptions that need to be met

Non-parametric tests are used as an alternative - they dont need rules to be followed or assumptions to be met

Assumptions include - sample size, normal distribution and linearity in regression

56
Q

Examples of parametric test

A
  • one sample t-tests
  • two sample t-tests (also called students t-test)
  • chi square test
  • ANOVA test
  • Pearson correlation coefficient
57
Q

Examples of non-parametric test

A
  • one sample Wilcoxon test
  • Mann-Whitney U test
  • Fishers Exact test
  • Kruskal-Wallis ANOVA
  • Spearman rank correlation coefficient
58
Q

Give two examples of critical appraisal tools

A
  • CASP

- AXIS has 20 questions and no scoring system

59
Q

Give the frequency of these single gene defects

Cystic fibrosis, alpha-1-antitrypsin deficiency, Hereditary Haemorrhagic Telengretasia (HHT), Immotile cilia syndrome

A

Cystic fibrosis = 1 in 2500
alpha-1-antitrypsin deficiency 1 in 2000
Hereditary Haemorrhagic Telengretasia (HHT) 1 in 4000
Immotile cilia syndrome 1 in 20000

60
Q

How can the CFTR gene (chromosome 7, 27 exons and 1480 residue proteins) be identified

A
  • linkage
  • positional cloning
  • sequencing
61
Q

How can Cystic Fibrosis be diagnosed

A
  • sweat test

- gene mutation analysis

62
Q

Give the symptoms that Cystic Fibrosis could cause

A
  • abnormal ion transport across epithelium
  • salt loss
  • impaired mucociliary clearance
  • chronic infections
  • sterility (infertility)
  • impaired digestion (meconium ileus)
  • failure to thrive
  • liver disease
  • diabetes
63
Q

Treatment of Cystic Fibrosis

A
  • pancreatic enzyme supplementation
  • control of infection
  • suppression of chronic infection - antibiotic nebulisers
  • bronchodilation - salbutamol nebulisers
  • anti-inflammatory - azithromycin
  • diabetes - insulin
  • vaccinations - flu, pneumococcal
64
Q

Give chromosomal cause of alpha-1-antitrypsin

A
  • autosomal recessive
  • chromosome 14
  • 14q32.1
65
Q

What is the normal phenotype for alpha-1-antitrypsin deficiency and the disease phenotype

A

M is the normal phenotype

S and Z are associated with major disease presentation

66
Q

What are the clinical presentation of alpha-1-antitrypsin deficiency

A

Due to build up of deformed alpha-1-antitrypsin in the liver

  • childhood jaundice
  • early onset cirrhosis

Due to the unopposed action of neutrophil elastase in the lungs
-early onset emphysema and bronchietasis

Highly sensitive to cigarette smoke

67
Q

What is the inheritance pattern of hereditary haemorrhagic talengiectasia (HHT)

A

Hereditary haemorrhagic talengiectasis (HHT) is also known as Osler-Weber-Rendu diseases (or syndrome)

-causes abnormal blood vessel formation in the skin, mucous membranes and in the organs such as the lungs, liver and brain

68
Q

Give the loci affected by the 3 forms of Osler-Weber-Rendu Syndrome and symptoms experienced in these conditions

A

HHT1
-endoglin gene (ENG) on chromosome 9

HHT2
-ALK-1

HHT3
-chromosome 5

  • talengectasia
  • epitaxis
  • PAVMs
  • GI blood loss
69
Q

Give another name for immotile cilia syndrome and its inheritance patterns

A
  • Kartagner’s syndrome or primary ciliary dyskinesia

- autosomal recessive

70
Q

Give some symptoms of Kartagner’s syndrome

A

10 variations in dynein arm

  • infertility
  • sinusitis
  • bronchiectasis
  • situs invertus
71
Q

Give some examples of disease from polygenic influences

A
  • asthma
  • chronic obstructive pulmonary disease (COPD)
  • venous thrombosis and pulmonary embolism
  • Tuberculosis
  • sarcoidosis (NRAMP)
  • Obstructive sleep apnoea
  • infant respiratory distress
72
Q

Give 4 examples of autosomal recessive respiratory disease and their genes

A
  • cystic fibrosis - CFTR
  • alpha-1-antitrypsin - SERPINEA1
  • kartagener’s syndrome (immotile cilia) - DNA1
  • pulmonary veno-occlusive disease - E1FZAK4
73
Q

Give an x linked example of a respiratory condition and their genes

A

-chronic granulomatosis disease CYBB

74
Q

Give 2 examples of autosomal dominant conditions and their genes

A
  • hereditary haemorrhagic telangectasis (HHT) - ALK, ENG

- hereditary pulmonary arterial hypertension (HPAH) - BMPR2

75
Q

What is secondary prevention

A

-aims to detect early disease in order to alter the course of the disease e.g screening by mammography for breast cancer in order to treat it early

76
Q

What is sensitivity and give the formula to calculate it

A

-the proportion of people with the disease who are correctly identified by the screening test

True positive/ true positive + false negative = sensitivity

77
Q

What is specificity

A

-the proportion of people without the disease are correctly excluded by the screening test

True negative/ true negative + false positive = specificity

78
Q

What is positive predictive value and give the formula to calculate it

A

-the proportion of people with a positive test result who actually have the disease

True positive/ true positive + false positive = positive predictive value

79
Q

What is a negative predictive value and give the formula to calculate it

A

-the proportion of people with a negative test result who do not have the disease

True negative/false negative + true negative = negative predictive value

80
Q

Give the formula to calculate prevalence

A

True positive + false negative = true positive + false negative + false positive + true negative

81
Q

Which of these have an effect the predictive values

  • prevalence
  • specificity
  • sensitivity
A
  • predictive values are dependent on prevalence

- sensitivity and specificity do not affect predictive values

82
Q

How would screening programs be evaluated

A

-by randomised controlled trial (individual or clusters)

83
Q

Give 3 forms of bias that can affect evaluation of screening programs

A
  • selection bias
  • lead time bias
  • length time bias (or length bias)
84
Q

What is selection bias

A

-people who chose to participate in screening programmes may be different from those who do not

  • may be at more risk
  • may be at less risk
85
Q

What is lead time bias

A

When screening appears to increase survival time because disease was discovered and diagnosed earlier

86
Q

What is length time bias

A

An overestimation of survival because long duration cases are more likely to be detected and treated than short duration cases e.g PSA screening more likely to be detected as the tumour is slow growing

87
Q

What are the 5 types of screenings

A
  • population-based screening programs (national diabetes and hypertension screening like in thailand)
  • opportunistic screenings (prevention and control of substance abuse)
  • screening for communicable diseases (heaf test)
  • pre-employment and occupational medicals (vision test for commercial drivers)
  • commercially provided screening (screening is a programme not a test)