Statistics Flashcards
what is the internal validity of a study?
the extent to which a study establishes a trustworthy cause and effect
what is the external validity of a study?
the extent to which the results of the study can be applied to real life
what are three things that can affect the validity of a RCT?
bias - different types of bias
confounding factors
chance
what is selection bias?
bias when assigning individuals to groups which may lead to differences that can affect the outcome. There are three types.
what are the three types of selection bias?
sampling bias - subjects are not representative of the population
volunteer bias - people with the condition may not volunteer willingly
non-responder bias - some populations may be less likely to respond to the study so are less represented
what is prevalence/incidence bias?
when a study is investigating a condition that is characterised by early fatalities it may miss earlier cases from the calculations
what is recall bias?
difference in accuracy of the recollections retrieved by study participants, possibly due to whether they have a disorder or not
what studies does recall bias typically affect?
case-control studies
what is publication bias?
failiure to publish results from valid studies, often as they showed a negative or uninteresting results
what is work up bias (verification bias)?
in studies which compare new diagnostic tests to the gold standard, work up bias can be an issue as the clinician may be reluctant to order the gold standard test, unless the new test is positive due to invasiveness or price of gold standard test
what is expectation bias?
observers may subconsciously measure of report data in a way that favours the expected study outcome - only affects non-blinded trials
what is the hawthorne effect?
describes a group changing its behaviour due to knowledge that it is being studied
what is late look bias?
gathering information at an inappropriate time e.g. studying a fatal disease many years later when many patients may have died already
what is procedure bias?
occurs when subjects in different groups receive different treatments
what is lead time bias?
occurs when two tests for a disease are compared, the new test diagnoses the disease earlier but there is not actual difference in the outcome of the disease
what are the two different ways of sampling patients for a study?
probability sampling - means everyone included in the sample has equal probability of being chosen
non-probability sampling- not everyone has equal probability of being chosen
what are 4 methods for probability sampling?
- simple random sampling
- systematic sampling
- stratified sampling
- clustered sampling
what is an example of simple random sampling?
using random number generator to assign patients with random number, then randomly assigning these numbers to groups
what is an example of systematic sampling?
every 5th patient assigned
what is an example of stratified sampling?
split the group into male and female and select equal participants
what is clustered sampling?
select subgroups within the population - useful in primary care
e.g. divide all GP practices in the city in clusters, then randomly select a few GP practices (clusters), then include all the patients from the selected GP practices in the study
how does clustered sampling and stratified sampling differ?
clustered sampling - allocating participants based on clusters (natural groups e.g. GP practices, school, hospitals) - this is logistically easier
stratified sampling - allocating participants based on clusters or characteristics (e.g. age, gender, ethnicity) - this is when you want a proportional representation of different subgroups in the sample
what are 4 examples of non-probability sampling?
- convenience sampling
- quota sampling
- judgement/purposive sampling
- snowball sampling
what is convenience sampling?
first come first serve - i.e. ask people to sign up for a study and just take the participants that come forwards
what is quota sampling?
the population is divided into subgroups by age, gender or ethnicity and a quota is set for filling each subgroup. The participants are then selected non-randomly until the quota is filled (i.e. by convenience, first come first serv)
what is purposive sampling?
participants are chosen based on specific criteria i.e. you would specifically contact patients with a disease, because you want to study it - rather than randomly sampling the general population
what is snowball sampling?
participants recruit other participants - good for hard to reach or isolated groups
what is a cohort study?
take a cohort of people are study them over time - this is observational and prospective. Two or more groups are selected based on their exposure to a particular agent (e.g toxin, smoking) and are studied over time to see how many develop a disease or other outcome.
what measure is usually used to measure the outcome of a cohort study?
relative risk - as you compare the two groups
what is a case-control study?
participants with a particular condition are matched with controls. This is observational and retrospective. Data is then collected on the past to identify a possible causal agent for the condition.
what outcome measure is usually measured in a case-control study?
odds ratio
what are the positives of a case-control study?
inexpensive
produces quick results
useful for rare conditions
what are the negatives of a case-control study?
usually prone to confounding factors
what is a cross sectional study?
provides simply a snapshot in time - sometimes call prevelance studies
provides weak evidence of cause and effect
what is crossover trial?
participants experience both the experimental arm and the placebo - useful if it is unethical to deprive patients of a particular treatment (ie. cancer treatments)
what is a quasi experimental study?
participants are chosen who have ALREADY been exposed to the experiment where is unethical to expose them i.e. children playing violent video games
what are the pros of a cohort study?
helpful when the exposure is unethical - as participants already have the exposure
can measure multiple outcomes
cheap
can analyse risk
what are the cons of a cohort study?
participants can be lost to follow up
can be affected by recall bias if retrospective
confounding variable
what is the incidence?
rate at which new cases occur in a population over time i.e. 10 new cases in 1000 per year
what is the prevelance?
total no of cases of a disease that currently exist at any given time i.e. currently 50,000 people with asthma
what study is used to measure prevelance?
cross sectional study
what study is used to measure incidence?
cohort study
which is the best type of study that is considered gold standard?
meta analysis / systematic analysis
what is grounded theory?
in qualitative research - it is a method used to generate a new theory about a phenomena of interested from the collection of new data. The new theory needs to be grounded or rooted in observations made - i.e. the name.
It is a complex process, which begins by raising questions that help guide research but are not static or confining and then over time core theoretical concepts are identified.
what is ethnography?
the aim is to study an ENTIRE culture, through the researcher becoming immersed in the culture as an active participant and recording field notes.
what is phenomenology?
the goal of phenomenology is to describe the real “lived experience” of a phenomenon
what are the 4 types of sampling for qualitative data?
1 - convenience
2 - purposive
3 - snowballing
4 - case study - select a single individual
what are 4 ways of assessing the validity of a qualitative study?
1 - triangulation
2 - respondent validation (aka member checking)
3 - bracketing
4 - reflexivity
what is triangulation?
comparing the results of two or more different methods of data collection (for example - interviews and observation)
what is respondent validation?
techniques where the investigators account is compared to the participants in order to check the level at which they correspond
what is bracketing?
deliberating putting asides ones own beliefs about the phenomenon under investigation
what is reflexivity?
sensitivity to the ways in whcih the researcher and research process have shaped the collected data
what are consensus methods in qualitative research?
the way in which the researchers aim to gain a general agreement around a topic
what are two methods of consensus in qualitative research?
delphi method
nominal group technique
what is the delphi method?
aims to gather opinions from experts in a particular area. Occurs in 3 stages:
stage 1 - open ended questionnaires sent to participants to generate statements about the topic
stage 2 - participants then asked to rank all of the statements produced in stage 1
stage 3 - statements are further refined and re-ranked to achieve consensus
if consensus not achieved in stage 3 then that stage can be repeated
what is nominal group method of consensus?
group of highly structured meetings with a controlled discussion
members independently record ideas and opinions, which are then re-presented to the group and used to clarify and categorise ideas
group members are then asked at the end to rank the ideas to achieve consensus
what are the two types of qualitative data that can be collected?
nominal data - data is placed into named categories - there is no hierachy given to these categories, you can count but not order them (i.e. birthplace)
ordinal data - observed values can be put into categories which can be ordered (ie NHYA classification of heart failure symptoms)
what are the 4 types of quantitative data?
discrete - values are finite whole numbers i.e. number of asthma exacerbations per year
continuous - data can take any value i.e. weight
binomial - data can have two values (i.e. biological sex)
interval - measurement between the two values is meaningful i.e temperature (not the same as continuous as body temp cannot be 0)
what is the null hypothesis?
prediction of no relationship between the two variables being tested
what is the alternate hypothesis?
predicts a relationship does exist between the two variables being tested
what is a type 1 error?
the null hypothesis is rejected when it is true (i.e. showing that there is a difference between groups, when actually there is not - false positive)
this is determined against a preset significance level of alpha
what is a type 2 error?
the null hypothesis is accepted when it is false i.e. saying there is no correlation between groups when actually there is
this is termed a beta error
what is the power of a study?
the power is the probability of correctly rejecting the null hypothesis when it is false
how is the power of a study calculated?
1 - the probability of a type II error (i.e. beta) - so can also be calculated as
1 - beta
how can you increase power?
by increasing the sample size
what is causation?
measure of whether the independent variable (cause) has an impact on the dependent variable (effect)
what is association?
situation where two phenomena occur together - these could either be related or by chance
what are the three types of association?
spurious - relationship between the variables occurs purely due to chance
indirect - relationship between the two variables is due to a confounding factor
direct - there is a true association between the two variables
which criteria is used to determine causality?
bradford hill criteria
what is correlation?
research method used to measure the relationship between two variables - measured as “p” value where p=0 is no correlation and P = 1 is perfect correlation
what is parametric data?
data that follows a normal distribution
what is non-parametric data?
data that does not follow a normal distribution
what two tests can be used to determine significance in parametric data?
students T test -
pearsons coefficient
what is reliability of data?
is the consistency of the data - can it be replicated consistently to produce similar results
what is the validity of data?
whether a test accurately measures what it is supposed to measure
how do you calculate the mean?
sum of all values / total number of values
how do you calculate median?
sort all the values into order and select the middle value
how do you calculate the mode?
most common data appearing in the data set
what skew is mean > median > mode ?
postive skew
what skew is mean < median < mode?
negative skew
what is mean = median = mode?
normally distributed
what is the percentage distribution of data across 3SD of the a normal distribution?
68.3% lies within 1SD of the mean
95.4% lies within 2 SD of the mean
99.7% lies within 3 SD of the mean
in what study do you use risk to calculate the result?
cohort study
how do you calculate absolute risk?
no of events/total no in the group
how do you calculate EER?
no of events in experimental group /
total no in the experimental group
how do you calculate CER?
no of events in the control group / total no of participants in the control group
how do you calculate relative risk (also known as risk ratio?) ?
EER / CER
how do you calculate absolute risk reduction?
CER - EER
how do you calculate absolute risk increase?
EER - CER
how do you calculate relative risk reduction?
( CER - EER ) / CER
OR 1- RR
how do you calculate relative risk increase?
EER - CER / EER
how do you calculate NNT?
1/ARR
how do you calculate NNH?
1/ARI
which studies use odds?
case control studies
how do you calculate the odds?
no of people with event / no of people without the event
how do you calculate the odds ratio?
odds of exposure / odds of control
what does confidence interval show?
range or interval of values in which the “true” value lies - i.e. confidence interval of > 95% - you are 95% confident that the true result lies in the range, with a 5% chance that it lies outside of this range
what is the difference between ANOVA and t-tests?
t-tests compares the means of two samples only
ANOVA - compares the mean or two or more samples (i.e. if you had groups of 20-30yrs, 30-40yrs, 40-50ys ANOVA would be used to compare the means across these different groups)
what does the mann-whitney U test compare?
ordinal, interval, or ratio scales or unpaired data
what does the wilcoxon signed rank test compare?
compares two sets of observations on a single sample i.e. before and after test on the sample population following an intervention
what does chi squared test compare?
used to compare proportions or percentages across patients following two different interventions
what does spearman test and kendall rank test show?
correlation between two variables
what type of graph is used to show non-parametric data?
box plot
what type of graph is used to show the results of a meta analysis?
forest plot
what type of graph is used to detect publication bias?
funnel plot
what is the formula to calculate regression analysis?
y = a + bx
a = point at which the line crosses y axis where x = 0
b = coefficient line
x= chosen value on x axis
what are the 5 phases of a clinical trial?
phase 0 - exploratory studies - very small no of participants to explore the effect of the drug in the human body
phase I - safety assessment - determines SE prior to larger studies, conducted on health volunteers
phase II - assess efficacy - involves a small no effect by the disease
phase III - assess effectiveness - thousands of particpants RCT
phase IV - monitoring for long term SE and effectiveness
what is correlation and linear regression, and how do they differ?
correlation is a calculation of how closely one variable relates to another variable.
linear regression is then used to predict how much one variable may change when a second variable is changed. this is when you use the formula y= a+ bx
what correlation calculation is used for parametric and which is used for non-parametric data?
parametric data - pearsons
non-parametric data - spearmans
what is the true positive?
screening tool correctly identifies the patient as having the disease
what is the true negative?
screening tool correctly identifies the patient as not having the disease
what is a false positive?
the screening tool incorrectly identifies the patient as having the disease, when infact they do not
what is a false negative?
the screening incorrectly identifies the patient as not having the disease, when in fact they do
what is the sensitivity?
proportion of patients with the disease who have a POSITIVE result
how do you calculate sensitivity?
people with the disease (TP-FN)
what is specificity?
proportion of patients without the disease who have a negative result
how do you calculate the specificity?
people without the disease (TN + FP)
what is the positive predicted value?
the probability that a person with a positive test result actually has the disease
how do you calculate positive predicted value?
TP / (TP + FP)
what is the negative predictive value?
the probability that a person with a negative test result actually does not have the disease
how do you calculate the negative predictive value?
TN / (TN+FN)
what is cost effectiveness analysis?
CEA compares a number of interventions by relating costs to a single clinical measure of effectiveness (e.g. symptom reduction, improvement in activities of daily living).
how is cost effectiveness ratio calculated?
total cost / unit of effectiveness
what is cost-benefit analysis?
CBA is a technique in which all the costs and benefits of an intervention are measured in terms of money. A CBA is used to establish which of the alternatives has the greatest net benefit.
what is cost utility analysis?
CUA is a special form of CEA in which health benefits / outcomes are measured in broader, more generic ways enabling comparisons between treatments for different diseases and conditions - i.e. using QALY’s.
what is a QALY?
QALYs are a composite measure of gains in life expectancy and health-related quality of life. One QALY is equal to 1 year of life in perfect health.
what is the benefit of conducting a cost utility analysis compared to a cost effectiveness analysis?
CUA offers something that CEA cannot, which is to compare across treatments for different conditions. In principle, it is possible to compare treatments for, say, cancer with, say, schizophrenia to determine which is the most efficient at producing health gain in the form of QALYs.
what are the three types of costs in a cost minimisation analysis?
Direct - those associated directly with the healthcare intervention (e.g. staff time, medical supplies, cost of travel for the patient, childcare costs for the patient, costs falling on other social sectors such as domestic help from social services)
Indirect - those incurred by the reduced productivity of the patient (e.g. time of work, reduced work productivity, time spent caring for the patient by relatives)
Intangible - those that are difficult to measure (e.g. pain or suffering on the part of the patient)
what is the likelihood ratio for a positive test result?
how much the odds of the disease increase when a test is positive
how is the likelihood ratio for a positive test calculated?
sensitivity / (1-specificity)
what is the likelihood ratio for a negative test result?
how much the odds of a disease decrease when a test is negative
how is the likelihood ratio for a negative test calculated?
(1-sensitivity) / specificity
what is a nocebo?
a placebo that produces prominenet SE
what is the definition of p value?
P value - is the probability of obtaining a result by chance at least as extreme as the one that was actually observed, assuming that the null hypothesis is true