1B Statistics Flashcards

Question 1

Q

combining probabilities: OR

Answer

A

If event A and B are mutually exclusive (ie dice roll)
p(A or B)= p(A) + p(B)
if event A and B are not mutually exclusive

p(AorB)= p(A) + p(B) - p(AandB)

(unless you subtract the probability of pAandB then the probability of both occurring is included twice, wrongly inflating the probability

Question 2

Q

Combining probabilities: AND

Answer

A

p(AandB)= p(A) x p(BlA)

p(BlA)= probability of B given A has occurred. IF A has no impact on B then p(BlA)= p(B)

Question 3

Q

If a smoking cessation results in 0.4 chance of quitting. Adam and Ben never meet, what is the probability of at least one of them quitting

Answer

A

Events are not mutually exclusive so p(AorB) = P(A)+p(B) - p(AandB)

= 0.4 + 0.4 - 0.16
= 0.64

Question 4

Q

Sampling error

Answer

A

Sampling error is chance variation (as long as the study is unbiased) between the values obtained for the study sample and the values which would be obtained if measuring the whole population.

The most common method for measuring the likely sampling error is to calculate the standard error

Question 5

Q

Standard error

Answer

A

-estimates how precisely a population parameter (ie mean, proportion, difference between means) is estimated by the equivalent statistic in the sample

The standard error is the standard deviation of the sampling distribution of the statistic

The method of calculating the standard error therefore depends on the data type and the statistic being used (ie is the data continuous/binary are you calculating mean or proportion. All require different formulas for standard error)

Question 6

Q

sampling distribution

Answer

A

Could be created by drawing many random samples of the same size from the same population and calculating the same sample statistic. The frequency distribution of all these sample statistics is a sampling distribution.

These distributions (ie a normal distribution) helps you understand how a sample statistic differs from sample to sample and are the basis for making inferences from sample to population.

The shape of the sampling distribution depends on the type of statistic ( ie cont data and mean= normal distribution, binary data and proportion = binomial distribution)

Question 7

Q

How are confidence intervals calculated and how do they relate to standard error and the sampling distribution

Answer

A

-sampling distribution of a mean is a normal distribution and other stats (ie proportion or rate) can be approximated by a normal distribution
- in a sampling distribution the mean value is equivalent to the true population parameter
- standard deviation is equivalent to the standard error of the sampling statistic
- therefore 95% of sample statistics would lie within 1.96 standard errors of the true population parameter
- from this we can infer that there is a 95% chance that the true population parameter lies within 1.96 standard errors above or below a sample statistic

Question 8

Q

value used for 99% confidence intervals

Answer

A

+/- 2.58x standard error

Question 9

Q

interpret 3 possible scenarios when comparing 2 confidence intervals

CIs do no overlap
CIs overlap neither point estimate is within the others confidence interval
Either point estimate is within the confidence interval of the other

Answer

A

significant at the 5% level
Unclear- need a significance test
Not significant at 5% level

Question 10

Q

formula for conditional probability ( the p(BIA) )

Answer

A

p(BIA)= probability of B given A has occurred

we know:
p(A and B)= p(A) x p(BIA)

rearranged
p(BIA)= p(AandB)/p(A)

Question 11

Q

what is a statistical distribution?

Answer

A

A function that shows all the possible values of a variable and the frequency that they occur

Question 12

Q

Statistical distributions: the normal distribution

Answer

A

symmetrical bell shaped curve
described by 2 parameters:
variance (SD squared)
mean
The standard normal distribution has a mean of 0 and a variance of 1
-Any normally distributed variable can be converted to a standard normal distribution
the normal distribution is very useful as many variables in biology follow a normal distribution
the sampling distribution of a mean follows a normal distribution
with large enough samples other distributions approximate to the normal distribution

Question 13

Q

Standard statistical distribution: binomial distribution

Answer

A

PROPORTIONS
-the binomial distribution shows the frequency of events that have 2 possible outcomes
ie success and fail
-it is constructed using 2 parameters:
n (sample size)
pi (true probability)
when sample size is large it approximates to the normal distribution
used for:
discrete data with 2 possible outcomes
sampling distribution for proportions
since proportions or probability cannot be negative it has no negative values

Question 14

Q

In the normal distribution what percentage of the area under the curve is within:
1 standard deviation
1.96 standard deviations
2.58 standard deviations

Answer

A

1= 68%
2= 95%
3=99%

Question 15

Q

standard statistical distributions: Poisson distributions

Answer

A

RATES/COUNTS
-deals with the frequency with which an event occurs over a given time ie deaths from MI over a month
used in the analysis of rates
assumes that the data are discrete, events occur at random and are independent
described by a single parameter: variance (FOR THE POISSON DISTRIBUTION THE MEAN AND THE VARIANCE ARE THE SAME)
small samples give an asymmetric distribution and large samples approximate to the normal distribution
no negative values as a rate cannot be negative

Question 16

Q

Standard statistical distributions: Students T distribution

Answer

A

SMALL SAMPLE SIZE
-Bell shaped like a normal distribution but tails are more spread out
Single parameter: degrees of freedom
as the degrees of freedom increase it approaches the normal distribution

Question 17

Q

Standard statistical distributions: Chi squared distribution

Answer

A

right skewed shape
parameter: degrees of freedom
as degrees of freedom increase it becomes more like normal distribution
-used in chi squared tests which are used for analysing categorical variables (comparing expected and observed event frequencies)

Question 18

Q

Standard statistical distributions; F distribution

Answer

A

right skewed
-values are positive
parameter: a ratio of degree of freedom of the numerator and denominator of the ratio
uses: ANOVA tests

Question 19

Q

Degrees of freedom

Answer

A

number of independent pieces of information used to calculate a statistic

Question 20

Q

what is the difference between standard deviation and variance?

Answer

A

SD is a measure of how far apart values are in a data set
variance gives an actual value as to how far numbers in a data set are away from the mean
SD is the square root of the variance
SD is in the same units as the data where as the variance is not

Question 21

Q

Sampling distribution shape:
Outcome variable= continuous
statistic type = mean

Answer

A

Normal shaped sampling distribution

Question 22

Q

Sampling distribution shape:
Outcome variable= binary
statistic type = proportion/risk

Answer

A

Binomial distribution

Question 23

Q

Sampling distribution shape:
Outcome variable= binary over time
statistic type = rate

Answer

A

Poisson distribution

Question 24

Q

Question 25

Q

what is inference?

Answer

A

The process of drawing conclusions for a population based on observations collected from a sample

Question 26

Q

what are the 2 main methods of inference?

Answer

A

Estimation
point estimation (mean, proportion)
Interval estimation- expresses the uncertainty associated with a point estimate eg confidence intervals

-hypothesis testing
assess the likelihood that a given observation in a sample would have occured due to chance

both estimation and hypothesis testing are derived from standard error

Question 27

Q

measures of data location (5)

Answer

A

arithmetic mean
geometric mean
mode
median
5 percentiles

Question 28

Q

measures of data dispersion (5)

Answer

A

range
interquartile range
variance
standard deviation
coefficient of variation

Question 29

Q

measures of data location: arithmetic mean ( how to calculate, advantages and disadvantages)

Answer

A

all values summed and divided by n
if a sample arithmetic mean is denoted by xbar
if a population arithmetic mean is denoted by mu

-advantages: amenable to statistical analysis
- disadvantages: not good for asymmetric distribution, affected by outliers

Question 30

Q

measures of data location: geometric mean (how to calculate, advantages and disadvantages)

Answer

A

nth square root of the product of all the values
advantages: more appropriate for positively skewed distributions
disadvantages: cannot include any values of 0 or negative

Question 31

Q

measures of data location: median ( how to calculate, advantages and disadvantages)

Answer

A

middle values
advantages: unaffected y extreme outliers, good for skewed distributions
disadvantages: value determined solely by rank so gives no information on any other values

Question 32

Q

measures of data location: mode ( how to calculate, advantages and disadvantages)

Answer

A

most commonly occurring value
advantages: not generally affected by extreme outliers
-disadvantages: there may not always be a mode, not amenable to statistical analysis

Question 33

Q

measures of data location: percentiles (how to calculate, advantages and disadvantages)

Answer

A

data is ranked and divided into 100 groups where 100th percentile is the biggest
advantages: useful for comparing measurements (BMI, child height etc)
disadvantages: comparisons at the extreme ends of the spectrum less useful than those in the midde

Question 34

Q

measure of data dispersion: range ( how to calculate, advantages and disadvantages)

Answer

A

highest value minus lowest
advantages: simple, intuitive
disadvantages: sensitive to size of sample and outliers

Question 35

Q

measure of data dispersion: interquartile range (how to calculate, advantages and disadvantages)

Answer

A

the middle 50% of the sample
calculated as the upper quartile- lower quartile
-advantages: more stable than the range as sample size increases
disadvantages: unstable for small samples, does not allow for further mathematical manipulation

Question 36

Q

measure of data dispersion: variance ( how to calculate, advantages and disadvantages)

Answer

A

average squared deviation of each value from its mean
the formula differs slightly depending on whether calculated for a sample (divided by n-1) or a population (divided by n)
advantages: takes all values into account, useful for making inferences about population
disadvantages: units differ from that of the data

Question 37

Q

measures of data dispersion: standard deviation (how to calculate, advantages and disadvantages)

Answer

A

square root of variance
advantages: most commonly used, units are the same as data, useful for making inferences about the population
disadvantages: sensitive to some extent to extreme values

Question 38

Q

measures of data dispersion: coefficient of variance (how to calculate, advantages and disadvantages)

Answer

A

ratio of standard deviation to the mean
gives an idea of the size of the variance relative to the size of the observation
advantages: allows comparison of the variation of populations that have significantly different values
disadvantages: where the mean value is near 0 the coefficient of variance is highly sensitive to changes in standard deviation

Question 39

Q

6 key elements to mention when describing a graph in the exam

Answer

A

type of graph
the axes
the data displayed (ie mortality)
the units
any obvious findings
what interpretation, if any, can be made from the findings (remember very unlikely to be able to conclude causality from a graph)

Question 40

Q

Displaying categorical data: 2 types of graph

Answer

A

bar graph
pie chart

Question 41

Q

categorical data: bar graph

Answer

A

bars can show frequency (total count) or relative frequency (percentage)

Question 42

Q

categorical data: Pie chart

Answer

A

start at 1200 position and wedges should descend clockwise in order of size (ie biggest –> smallest clockwise)

Question 43

Q

continuous data: 6 types of chart

Answer

A

stem and leaf display
box plot
histogram
frequency polygon
frequency distribution
cumulative frequency distribution

Question 44

Q

continuous data: stem and leaf display (what is it and advantages disadvantages

Answer

A

a quick technique for displaying numerical data graphically
a vertical stem is drawn consisting of the first few significant figures of values in a dataset
any subsequent figures are the leaf
back to back stem and leaf displays can be used to display multiple data sets

advantages:
1. simple quick and easy
2. actual values are retained

disadvantages:
3. hard to display large data sets

Question 45

Q

continuous data: box plot (what is it and advantages disadvantages

Answer

A

gives a measure of central location (MEDIAN)
shows 25th can 75th percentiles so gives range and interquartile range

-Advantages:
1. box element contains a lot of information
2. good for comparing 2 datasets

Disadvantage:
1. actual values are not retained

Question 46

Q

continuous data: histogram (what is it and advantages disadvantages

Answer

A

divides the sample values into many intervals which are called bins
bars then display the number if values in that bin
most histograms use bins that a roughly equal in width but can aim to size bins so they contain an approximately equal number of sample (this can result in bins that are 2 narrow to see!)

advatages:
1. gives idea of data central tendancy
2. demonstrates skewness and the shape of the frequency distribution

disadvantages:
1. cannot read exact values as in intervals
2. more difficult to compare 2 data sets
3. can only be sued with continuous data

Question 47

Q

continuous data: frequency polygon (what is in and advantages disadvantages

Answer

A

constructed by joining the midpoint of the top of each bar in the histogram

Question 48

Q

continuous data: frequency distribution (what is in and advantages disadvantages

Answer

A

-essentially the frequency polygon that would be drawn for a histogram with a very large number of bins
-leads to a smooth line

remember you describe the skewness of a graph according to WHERE THE TAIL IS

Question 49

Q

continuous data: cumulative frequency (what is it and advantages disadvantages)

Answer

A

a running count starting with the lowest value and showing how the number of observations accumulate

Question 50

Q

continuous data: Showing association between 2 variables: which graph type?

Answer

A

bivariate data is almost always best shown using a scatter plot

Question 51

Q

continuous data: scatter plot for showing association between 2 variables

Answer

A

data from 2 variables are plotted against each other to explore the relationship between them
trend line is drawn to explore whether any correlation is:
1. positive negative or non existent
2. linear or non linear
3. strong, moderate or weak

advantages:
1. data values and data set are retained
2. shows a trend in data relationship
3. shows minimum maximum and outliers

disadvantages:
- data from both variables must be continuous
- hard to visualise large data sets

Question 52

Q

Z test: what is it and how is it used

Answer

A

Used to compare proportions/means between 2 groups.

Different formulas for testing different things but all include the standard error

The z value is looked up in a z-distribution table which gives a P value.

The test can be used for paired data. To do this the difference in the observation for each pair is calculated and then the pair is treated as a single observation

Question 53

Q

z test value of significance

Answer

A

z score > 1.96 is significant at the 5% level

Question 54

Q

T test: what is it, when is it used

Answer

A

Used to compare means/ proportions between 2 groups when a sample size is small (normally less than 60)

Based on a T distribution rather than a normal distribution.

T values are looked up in a T distribution table in order to discern the P value.

Question 55

Q

Chi Squared test

Answer

A

used to compare the counts of categorical response between 2 or more independent groups.

Needs to compare counts, it cannot compare proportions/percentages.

The formula for the chi squared values is on the formula sheet.

Firstly you need to construct a rows (r) x columns (c) contingency table. You then need to calculate the expected value for each box. This is calculated by multiplying the total value for the row by the total value for the column and dividing by the total number in the table.

Once you have used the formula to calculate a value for each box you sum all the values to give the chi squared value

You look the chi squared value up in a table to get a P value

You need to know the degrees of freedom to do this. This is calculated by

(r-1)x (c-1)= degrees of freedom

Question 56

Q

Chi squared degrees of freedom formula

Answer

A

(r-1)x(c-1)
where r= rows and c= columns

Question 57

Q

Chi squared value of significance

Answer

A

1.96 squared= 3,84
therefore a chi squared score of 3.84 = P=0.05

therefore a chi squared score > 3,84= significant at the 5% level

Question 58

Q

Two way ANOVA (what is it)

Answer

A

Used for when you have 2 or more independent groups but want to consider 2 factors

ie a study comparing drug A , drug B and drug C that also wants to analyse any difference in effects between male and females

Question 59

Q

McNemars: what is it and how is it calculated

Answer

A

-Used for paired data with a binary outcome (when you could not use chi-squared as would not take into account pairing)

for example: individual matched case control studies or multiple measures of the same variable on individuals
when calculating McNemars we are interested in areas of discordance between pairs. Ie if a case was exposed but the control was not.

You therefore need to construct the table differently with controls in columns with one column for exposed and unexposed and cases in rows with one row for exposed and one for unexposed.

The formula is on the formula sheet, n12 and n21 correspond to the 2 cells where there is discordance.

Question 60

Q

McNemars value of significance

Answer

A

value > 3.84 is significant at the 5% level

Question 61

Q

One way ANOVA

Answer

A

used when there are 2 or more independent groups needing comparison
parametric test

ie a study comparing the effects of drug A, Drug b and drug C

Question 62

Q

MANOVA (what is it)

Answer

A

A MANOVA is used when you have multiple dependent variables and 2 or more independant variables.

For example if you wanted to consider the effect of being male/female on IQ, reading and numeracy results.

Question 63

Q

Repeated measures ANOVA

Answer

A

parametric test

-used when you have paired data ie multiple reading of the same variable on each subject over time

Question 64

Q

The difference between parametric and non- parametric tests

Answer

A

If a sample is normally distributed it can be described by mean and standard deviation.
In this situation a difference between samples can be ascertained by examining their means
you cannot do this if a sample is not normally distributed.
in this instance you need to use a non-parametric test that focuses on rank-ordering the data rather than the individual values themselves
non-parametric tests are usually only used for small samples as with larger samples the lack of normality tends to not be problematic and parametric tests are used

Answer 64

A

lower power (greater risk of type II error)
harder to calculate confidence intervals
can generally only be used for bivariate analysis (ie unable to test for interaction or adjust for confounding)

Answer 65

A

-Used for paired, non parametric data

Find the difference between each individual pair
omit any 0 values
ignoring any +/- signs rank the differences (if there are 2 values the same place them both halfway between the ranks they would have occupied)
reapply the +/- signs
Find the sum of the positive ranks and the sum of the negative ranks (these are ‘rank totals’)
Ignoring +/- signs select the smaller rank total from 5
Look rank total up in a wilcoxon table. If the rank total is larger than the value in the wilcoxon table it is not significant at that level.

Answer 66

A

Non parametric test used for unpaired data

rank the values from both groups in a single table (colour code each group differently)
Add up the ranks for each group
select the smaller of the 2 rank totals
Look up in the mann whitney U table using n1 as the number in one group and n2 as the number in the other group
if the smaller rank total is larger than the number on the mann-whitney U table then the result is insignificant at this level

Answer 67

A

A p value provides an estimate of the probability of recording an association at least as large as the association found in the sample if the null hypothesis is true.

Answer 68

A

Alpha errors
False Positive
0.05 or 0.01

Answer 69

A

Beta errors
False negative
0.8 (80% power)

Answer 70

A

If you try to reduce the rate of type I errors but requiring a small P-value to suggest significance the risk of type II error increases as you are more likely to wrongly reject the null hypothesis.

Answer 71

A

when hypothesis and variables to be tested have not been prespecified
there can be a tendency to go looking for associations between variables
doing lots of tests increases the risk of a type 1 error given there is a 1 in 20 chance of getting a significant result by chance alone at a p value of 0.05 significance
should correct for multiple comparisons ie using bonferroni correction

Answer 72

A

Used when multiple comparisons are being made. Adjusts the alpha value of significance so that there remains a 1 in 20 chance of type I errors despite multiple comparisons.

adjusted alpha= original alpha (ie 0.05) / n

where n = the number of tests conducted.

Answer 73

A

Sample size calculations ensure there are sufficient participants in order to answer the study question but not so many as to be wasteful

Answer 74

A

Expected Effect size (small effect size needs larger sample)
Significance level (ie accepted type I error rate, a smaller P value requires a larger sample)
Power (usually power above or equal to 80% is used, higher power requires larger sample)
Event rate in population (in case control studies this is the prevalence of exposure in the controls, in cohort and experimental studies it is the prevalence of the outcome in the unexposed). Smaller prevalence requires a larger sample.
5 sd in population

Answer 75

A

Power is the probability a study will be able to detect a difference if it truly exists.

It is 1- type II error rate.

Answer 76

A

high loss to follow up
confounding
interaction
cluster sampling
low response rate

Answer 77

A

Increase the ratio of controls to cases.

Answer 78

A

Correlation tests assess the strength of any linear relationship between 2 variables

Answer 79

A

Pearson’s correlation
Gives pearson’s correlation coefficient (r)

r= -1 perfect negative correlation
r=0 no correlation
r=+1 perfect positive correlation

Answer 80

A

Spearmans rank correlation

Answer 81

A

calculated using a form of T-test
-r2 gives an assessment of how much variation in the y variable is accounted for by the linear relationship with the x variable
ie if r2 = 64% then the regression model (which derives the line of best fit) explains 64% of the variation in the y variable (other variables may explain the other 26%).

Answer 82

A

Linear regression is used to determine a line of best fit for associations between to variables which have a linear association when plotted on a scatter plot. Linear regression also gives the equation of the line of best fit

Answer 83

A

y=a +bx

where a = intercept with y axis
b= gradient of the line

Answer 84

A

the regression coefficient is another name the for the gradient of the line so = b
If there was no association the gradient would be 0.
So linear regression tests the null hypothesis that the regression coefficient = 0.
Accordingly if the confidence interval does not cross 0 this is evidence of an association

Answer 85

A

Multiple regression allows you to study the impact of multiple exposures on an outcome variable.

Allows adjustment for confounding factors

Interaction terms (ie when the effect of 2 or more variables is NOT additive) can also be included to assess for effect modification

Answer 86

A

Used for binary or ordered categorical outcome variables.

Answer 87

A

Generally used for rates or count data

Answer 88

A

Generally used for survival times

Answer 89

A

The probability of not experiencing the event of interest, at least until time point t.

Answer 90

A

The conditional probability of experiencing the event of interest at time t, having survived to that time (ie the specific rate at that time point)

Answer 91

A

Life tables
Kaplan-Meier method

Answer 92

A

Generally used to display patterns of survival in a cohort when we do not know the exact survival time of each individual but we do know number of survivors at regular time intervals

Answer 93

A

cohort life tables
Period life tables

Answer 94

A

Show survival of an actual group of individuals through time
This is the main method used in life table survival analysis

Answer 95

A

Uses age specific mortality rates applied to a hypothetical population to calculate expected survival times.

Often used in demographics

Answer 96

A

The number of individuals alive at the beginning of the time period (at)
The number of deaths (or other event of interest) during the time period (dt)
the number of individuals censored during the time period (ie lost to follow up. died of another cause etc) (ct)

Answer 97

A

nt= at- ct/2

where Ct is divided by 2 based on the assumption that average censorship happens halfway through the time period

Answer 98

A

rt= dt/ nt

Answer 99

A

S(t)= S(t-1) x St

Answer 100

A

life tables collect information and calculate the survival function at set time intervals
however for many studies such as cohort studies the exact day an event occurs is known
if it is known it is better to use this information than to use that the event occur at some point in the time period
the kaplan -meier method calculates the survival function every time the event of interest occurs or someone is censored.

Answer 101

A

The kaplan-meier survival curves are constructed using the calculated survival functions.

As these are calculated every time the event occurs or someone is censored the curve has a stepwise appearance with steps of varying widths

Answer 102

A

log rank test
Cox regression

Answer 103

A

a special application of the mantel-haenzsal chi squared test
For each time interval (life tables) or step (kaplan-meier) a 2x2 table is constructed in order to compare the proportion in each group who died
observed and expected deaths are compared to test the null hypothesis- there is no difference between the 2 groups
best if the difference in survival between the 2 groups remains constant
unlikely to detect a difference if the survival curves cross (always plot survival curves first to check for this)
purely a test of significance, cannot provide an estimate of the size of the difference between the groups.

Answer 104

A

The proportional hazards regression

Answer 105

A

the log rank test can only compare 2 groups ie were they given drug Ab or drug B
Cox regression can consider the impact of multiple variables on survival
it can therefore also adjust for confounding etc

Answer 106

A

Based on the assumption that the ratio of hazard (ie the immediate risk of dying at time point t) between groups is constant. Ie if group A are twice as likely to die at the beginning this is also true at the end

Answer 107

A

Cox regression gives a log hazard ratio which can be converted into a hazard ratio.

Hazard ratio is interpreted similarly to relative risk:
1= no difference
>1= increased hazard
<1 = decreased hazard

Answer 108

A

Differences in population, observations or studies

It is a particular problem for meta-analysis when you want to combine the results of many studies into a single result

Answer 109

A

Statistical heterogeneity (can be caused by clinical or methodological heterogeneity. Can be tested for, assesses whether study results differ by more than would be expected by chance)
methodological heterogeneity (studies conducted/ designed differently)
Clinical heterogeneity (difference in population, intervention or outcome measure

Answer 110

A

1.Cochran’s Q statistic
2. I squared statisitc

Answer 111

A

tests whether the differences between studies are greater than would be expected by chance
Has low power
tests the null hypothesis that the true effect size in all the studies is the same
A low P value therefore indicates heterogeneity
As power is low a 10% significance level is normally used

Answer 112

A

chi squared test for heterogeneity

Answer 113

A

-Developed due to Cochran’s q statistic having low power
- describes the percentage variation across studies that is due to heterogeneity not chance

Answer 114

A

25%= low heterogeneity
50%= moderate heterogeneity
75%= high heterogeniety

Answer 115

A

particular type of scatter plot
x axis plots treatment effect whilst y axis plots a measure of study precision (ie standard error, study size)

Answer 116

A

meta-analysis (look for small study effect)
performance analysis (look for units with outlying performance values)

Answer 117

A

generic term for the phenomenon that small studies often report larger effect sizes than larger studies
most commonly this occurs due to publication bias (if study is small it is more likely to be published if it shows a very large effect size)
can also occur ie if study is done on high risk patients, of which there is only a small number, however the intervention has a particularly large impact in these patients

Answer 118

A

Egger’s regression

Answer 119

A

If a small study effect exists in a meta analysis the treatment effect size will be overestimated

Answer 120

A

used to compare units/ clinical teams
measure of success is plotted along with confidence intervals, those units with values outside of the confidence intervals are identified as outliers which may warrant further investigation

Answer 121

A

method to incorporate prior beliefs into probability calculations
ie existing knowledge about a patient with effect how much credence a clinician will give to a test result
Bayems theorem is important when considering how to interpret test results
it allows the positive predictive value to be related to the sensitivity of the test and the negative predictive value to be related to the specificity

Answer 122

A

P(A and B)= p(A) x p(BgivenA)= p(B) x p(Agiven B)

therefore

p(AgivenB)= p(A)x p(BgivenA)/ p(B)

Answer 123

A

Sensitivity is the probability of a positive result GIVEN the person has the disease.

sensitivity is a test characteristic and it is not effected by background disease prevalence

Answer 124

A

The probability of a negative result GIVEN the person does not have the disease

Specificity is a test characteristic and it is not effected by background disease prevalence

Answer 125

A

The probability of someone having the disease GIVEN they have tested positive

This is affected by the population/the background disease prevelance

Answer 126

A

The probability of someone not having the disease GIVEN they have tested negative.

This is affected by the population/the background disease prevelance

Answer 127

A

Bayems theorem works better when things are expressed in terms of odds rather than probability

Prior odds= Prior probability/ (1- prior probability)

Answer 128

A

Likelihood ratios compare the probability that someone with the disease has a particular test result as compared to someone without the disease.

Answer 129

A

likelihood ratio= sensitivity/ (1- specificity)

Answer 130

A

posterior odds= prior odds x likelihood ratio

Answer 131

A

posterior probability= posterior odds/ (1+ posterior odds)

Answer 132

A

calculate prior probability
calculate prior odds
Calculate likelihood ratio
calculate posterior ODDS of disease
calculate posterior probability of disease

Answer 133

A

more flexible
incorporates ALL the available knowledge
mathematics is not contreversial

Answer 134

A

different people may quantify a priori probability differently
if different people calculate prior probability differently then they will get different posterior probability results

Brainscape's Knowledge GenomeTM

1B Statistics Flashcards

Brainscape's Knowledge Genome^TM