Statistics Flashcards
Parametric tests
ANOVA
require distribution
P value
More powerful than non para - show a diff that really exists
Analysis of variance - test multiple groups of parametrics
Req distribution to be normal
Incidence of of hypertension analysed w/
Categorical (or qualitative) data thus requiring a non-parametric test (that is, chi square test).
Mean
average of a group of values “average” - when add all numbers & divide
the central tendency of a group of measurements
most sensitive measurement, because its value always reflects the contributions of each of the data values in the group
Median
mid-point of a group of values. / middle value in the list of numbers.
find - list numerical smallest to largest, so you may have to rewrite your list before you can find the median.
data set contains a small number of outliers at one extreme, the median may be a better measure of the central tendency of the data than the mean
Mode
mode is the value that appears most frequently in the group of measurements.
Qualititve data -
Not numerical -> names/labels
ASA grade, type op, hair colour, pain scolour
Nominal - Mutually exclusive - no logical order hair colour type op
Ordinal - instrincs order - pain score asa grade
Quantitative
Numeral in value - vary represent contin scale
HR BP Height
Discrete - vary by set amount
number childer cant have 2.4
Contionous - take any number height bp age
Interval - zero point another point - not no measurement
Celsiues
How dispaly qual data in graphical form
qual data - not murical - vary has lavel =
Freq table - before depiected bar chart / pie chart
each freq can be given %
How describe quantitative
Quote central tendency - scatter of data from central point
Normally dsitrubted - mean dexc cenral tend
vary / sd - describ ebarration
Non normal distrub - median
IQ range - scatter
Normal distribution
non normal distribution
Distrub curve - created plotting observed values on X
freq y
Normally distrib - curve symettrical & bell shaped
Normal distrub mean mode median all same
‘parametric’
nON NORMAL / non para - ditrsub curve not symetric bell
Skewed either direction / bimodal
tail skewed right - right / pos skew
Data skew - mean mode median no longer same
Mod - most freq occuring - always peak
median - vlaue where equal numbers below & above - moves towards tail skew
mean also pulled same direction tail - erronues
How calculate variance
Spread data around central point
First calculate mean X~
Subract each idnvidual result from mean - find defrence X~-X
Square all result - make sure all positive
Add together
Divide number of degrees of freedom - obs minus 1 or N-1
What is SD
Central tendency prametic - described mean
vary around mean described variance
Calculated by swuare root of variance - used freqenty - dexvrib coventiontly
68% pop - 1SD either side
96% 2SD
99%3SD
What is the standard error of the mean
SEM used wheter mean reflects mean of true populaton
Show how mean small sample size repreents whole pop
Larger sample - more likely refeclt true pop
If SD small- vraiamce around mean small - more confident closs mean true pop
Calculated dividing standard dev by square root of degrees of freedom
Though of standrad dev mean - 68% sample mean - lie withone one standrard
Confidence limits
Related to SEM
Sample Mean will only lie outside 1.96 standrad errors 5% of time
Confident 95% sample mean rfelcts population mean
Range between two standard errors below the mean and two standard arror above = Condifence itnerval
Either end are confidence limits
Condience limits samel value as data m,eaurement - easier to interpret
Standrard error of for non parametic `?
No - data skwered - standard deviation doesnt accurately refeclt viration data around mean - impssible calculate SEM
Non parametric - quote range contains 50% results - median 50%above& below
25th centile 25% below 75% above
range betgween two is called interqaurtile range
P value
Probability event occuring
p =1 always occurs
p =0 never occurs
Compare difference between sample pop & true pop
Genereal - sample size signif smaller pop size
determ any difference occurd purely by chance
Acceptaed only probabilty of 1 in 20 p = 0.05 diffrence occured
Small enough to be disregarded - difference between group stat significant
If p>0.05 - not signifcant - occured by chance
Null hypothesis
Tests performed - assumption no signifcant difference between means samples / originate same parent population
If result produce p<0.05 - probability two samples oringation same population <1 in 20
Null hypotehsis reject - considered stat signif diffrence between samples
P >0.05 - higher probability diffrence occur by chance
Null hypotehsis no sdidference between samples
Type 1 error
Alpha error of false pos
Null hyptoehsis wronly reject - difference found when is none
Lower P value & larger size - smaller chance type 1 error
Accept p 0.05 - accept risk making type 1 1in20
Type 2 error
False Negative
Null hypoth accepted - no difference
3 factors
small size
large vary in pop
situation small diff clin imporat
20% chance type 2 erorr - study power
Power of study
Meausre Likelihood detecting deifference between groups if difference does exist
Power 1-B - b error or type 2 error
Effective probabilty avoiding type 2 weeoe
No difference between groups - concluded no clin improt difference in samples = provind adeqautge power
If type 2 error - study power insuff - conclude sample size too small
No diffce - sample end - meanfuul conculion
power porposed study calulcated prior start
Number patient - ensure suffienty power equations or normograms
How do you chose which tests to analyse data
Consider choosing appro stat test
Nature date
Qual or quant
Quant - type distrub
Parametric non para
two groups or more than two
Data pair unpair
Qualitative data
ASA grade, pain score
Using chi square
O number observed occurance - E number expect occur
Coprares freq observes results v frequency expcted if no difference
Ease - 2x2 contingency table
Two diff sample group & 2 outcome
Drug a and drug b
patient vomit & didnt vomit
Bext caluate number expected comot or not vomit if no differnece drugs
expected = colum total x row total / overall total
Number patients expect vom drug a no diff
repeat calculation in contingency table
Next for box cotnigency apply formula - results four caulation added give chi aquare results
P value depends on chi squares & degree variations
Unable use chi square
If expected occurance <5 chi square not usd
fisher exact test
Fisher exact test
Difference pair & unpair data
Unpair - two different group patient study
Datas pair - two vary test same patient =- anti htn two diff drugs study on same group patient
pARAMETIRC WHAT TEST -
kNOW PARATRIC - DETMRINE HOW MANY GROUPS
TWO GROUPS
pair student t test or unpair student T - depend if pair or nt
More than two groups = ANOVA
Student t test
Analyse normally distrub data - knowl differnece between means of 2 samples & SEM
T - Diffce betweens / est SEM
P value read tables t value - samples
ANOVA
Anal variance - compare parametric quant data - >2 groups matsanova complex - software ahdnle
Stat test - non parametic
Decide how many groups - two groups
wilcoxon signed rank - pair
mann whitney U if unpaired
> 2 groups - friedman pair
paired ksural wallis unpaire
Not normally distributed data
What is the more appropriate mean
Geometric or arithmetic
What is the SD
Positively skewed data where are the mode mean and median in relation to each other
what is the mode
where do mean and median lie when its normally distributed
The mode refers to the most frequently encountered value and in normally distributed data it coincides with the mean and median values.
In skewed data the geometric mean is the most appropriate measure (not the arithmetic mean).
Standard deviation (SD) is the square root of the variance and is a measure of distribution of the data.
In positively skewed data the mean usually lies to the right of the mode (not left).
In positively skewed data the mean usually lies to the right of the median (not left).
Double blind
how does this differ from crossover
In double blind placebo control clinical trials neither the patient nor the clinician knows which treatment option the patient has received. It would not be blind to the patient otherwise.
If everybody received both treatments then this would be a ‘double blind crossover study’.
The clinician remains blind to the treatments received by the patients until the study has finished.
Can we comment on a study design without knowing lots of info
Placebo effect is how much % different
What does p value mean
what p value is stat signif
It is not possible to say confidently that this drug trial was well designed without further information about the study and its conduct.
The placebo effect is often higher than 5%, with rates between 20 - 30% being common.
The result may indeed have occurred by chance alone in less than one in 20 occasions. This is the meaning of the ‘p value’, where a 0.05 is equal to 1/20.
Standard error is derived from the variance and ‘probable error’ is a fictitious term.
A p value of less than 0.05 is the conventional level of statistical significance, thus the results should be regarded as reaching conventional levels of statistical significance.
If the p-value is less than 0.05 indicates that there is strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct (and the results are random).
What is the null hypothesis
the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error.
N of 1 is useful when
Can the results be generalised
What t 1/2 is best
In an ‘n of 1’ trial the treatment and placebo are given at random treatment periods to the same patient.
The results are specific to one drug and the patient studied and cannot usually be generalised.
They are useful where the patient doubts the effectiveness of a treatment or where the practitioner has doubts. They are also useful for dosing or working out if a symptom is a side effect or not.
Drugs with short lived effects are best, as long wash-out periods need to be included for long acting drugs.
Relative risk
Determined how
What does it measure
Relative risk may be determined in prospective and retrospective studies and is a useful measure of the strength of association between disease and a risk factor.
In a prospective study of a population, participants are selected without reference to the presence or absence of disease. - best assessed in prospective
After excluding prevalence cases the population is followed over time. The number of new cases occurring thereafter is divided by the population at risk, giving an incidence rate.
Of two indicates a doubling of risk between the groups
What is student t useful for
The Student’s t test is inappropriate as we are comparing proportions not means.
Pearson’s coefficient of linear regression is inappropriate as there is no linear regression to plot.
The data would be ideal for evaluation using the chi square test.
The numbers are not too small to draw any statistical conclusions.
No clinical drug trial is ever that obvious and statistical testing should be performed.
Sample sizes can be used for what
What type of variables can be used
What is important to determine before starting
What is the power
How can we calculate significance and sample size
How is max power achieved
Sample sizes can be calculated for population studies, clinical trials and most forms of studies.
Binary, ordered categorical and continuous variables can be used.
It is very important before commencing a clinical trial to determine which variable will be the primary end point, what magnitude of difference is clinically relevant and have an estimate of the standard deviation (SD).
From these data and statistical significance (a), usually p = 0.05. The probability of correctly rejecting a false null hypothesis equals 1 − β and is called power.
With the expected mean difference/SD and a decision of significance and power a sample size can be calculated.
Maximum power is achieved by having equal groups, but unequal group size can be used.
Life table ananlysis
What are end points
How can the be compared
Can confounding variables to adjusted for
Life table analysis is used in various contexts to follow a population until certain end points occur. Death is a suitable end point but development of disease or disability are also suitable.
For example, the mortality in groups of smokers and non-smokers could be collected over a period of time and survival plotted as a function of time. The development of retinopathy in diabetics, or the time from treatment of multiple sclerosis patients treated with a interferon or placebo to next relapse would also be suitable for life table analysis.
These incidence data are best collected prospectively.
Life tables from two groups can be compared by calculating a chi-square statistic (Mantel-Haenszel procedure or log rank method). Relative risk can also be calculated from such data.
Mathematical models can be applied to life table data to adjust for confounding variables (co-variables). An example of this is the Cox proportional hazards model.
What is the incidence
The incidence refers to how often a situation occurs (not the prevalence).
What is the prevalence
The prevalence refers to how common is a situation (not the incidence).
What is the sensitivity
The sensitivity of a clinical test refers to the ability of the test to correctly identify those patients with the disease.
What is the specificity
The specificity of a clinical test refers to the ability of the test to correctly identify those patients without the disease.
The sensitivity and specificity are independent of the population of interest subjected to the test.
What are PPV and NPV
However, the terms positive predictive value (PPV) and negative predictive value (NPV) are used when evaluating a test to a clinician and are dependent on the prevalence a disease in the population being looked at.
What is the reliability
The reliability is the ability of a test to produce the same result when repeated under identical conditions.
T test - used wen
ANOVA when
Chi square when
Mann whitney - when
The t test is used when dealing with normally distributed interval scale data (baldness is not such data, but height is).
ANOVA compares normally distributed interval scale data in multiple groups.
The chi square test measures differences between nominal data, which are usually yes/no or dead/alive.
The Mann-Whitney test is used for analysis of ordinal data and blood loss is normally distributed interval scale data.
Normal distrib
95% CI calculated as what times SEM
what does it mean
Can they calculated non parametric data?
Does it indicate significance
In a normal distribution of a large population (greater than 30), 95% confidence intervals can be calculated as ± 1.96 times the standard error of the mean.
This means there is a 95% chance that the true population mean will lie within the range of values. If repeated samples were taken and the 95% confidence interval was computed for each sample, 95% of the intervals would contain the population mean.
Ninety five percent confidence intervals can be calculated for non-parametric or interval data but this uses a different method than 1.96 × sem
When comparing the effects of two treatments (for example, active drug and placebo or two populations) 95% confidence intervals indicate the size of any effect rather than just indicating if there was an effect as in significance testing.
There is a close relationship between the use of confidence intervals and the two-sided hypothesis test.
Is there evidence wearing scrubs outside increase infection
evidence for masks
Cdiff removed with hand gel?
The Association of Anaesthetists of Great Britain and Ireland (AAGBI) guidance states: “There is little evidence to show that wearing surgical attire outside the theatre and returning to the theatre without changing increases surgical wound infection.” Also, in terms of wearing headgear there is “little evidence for the effectiveness of this practice except for scrub staff in close proximity to the operating field.”
The AAGBI state that controversy exists regarding masks and that local policies should be followed.
It should be noted that Clostridium difficile is not removed with alcohol hand gel.
High level disinfection of equipment kills vegetative bacteria (not all endospores), fungi and viruses.
The frequency of distribtuion can be described how
The frequency of distribution can be described by:
Stem and leaf plots
Histograms
Table of frequencies
Positive and negative skews (not medians and correlation coefficients).
ARR is what
How is it calculated
What then would be NNT
How many patient treated at what cost
Absolute risk reduction (ARR) is a means of measuring the difference between two treatments.
Explanation
The ‘absolute risk reduction’ is 10% − 6% = 4%.
The ‘number needed to treat’ to prevent a stroke therefore equals 100/4 = 25.
25 patients would need to be treated at a cost of £100/month for 12 months to prevent a stroke which gives the total cost as £30,000.
What is value for correlation coefficient
If it is 0 what does that mean
If its positive what would the slope be
The correlation coefficient cannot be higher than 1 and is usually between -1 to +1.
If the correlation coefficient is 0 there is is no linear relationship between height and the PEFR.
If the correlation coefficient is positive, the curve would have an upward slope.
If a correlation can be made then figures can be extrapolated and 1.5 m is not too far from the lower height of 1.6 m.
The PEFR is the dependent variable and is usually put on the Y (vertical) axis, whereas height, the independent variable is on the X (horizontal) axis.
What is the leading cause of anaphylaxis
What agent is most responsible
Antibiotics are currently the leading cause of perioperative anaphylaxis in the UK. They are responsible for 46% of cases with identified causative agents. Co-amoxiclav and teicoplanin between them account for 89% of antibiotic-induced perioperative anaphylaxis
The second leading cause is neuromuscular blocking agents (NMBAs), responsible for 33% of cases).
Patent blue dye (14.6/100,000 administrations)
Chlorhexidine (0.78/100,000 administrations)
Suxamethonium (11.1/100,000 administrations)
Teicoplanin (16.4/100,000 administrations)
Co-amoxiclav (8.7/100,000 administrations)
Perioperative anaphylaxis to chlorhexidine poses a significant risk in the healthcare setting due to its widespread use with some being fatal.
What is the SEM calculation
SEM is what
If testing if no diff in population
What does T =
What is SD
which is greater, it or the SEM
The standard error of the mean or SEM equals the standard deviation or SD divided by the square root of sample size.
SEM is the standard deviation of all the means of large random samples of size n from a given population. It is of central importance in significance testing.
If testing to see if there is a difference between two population means (for example, t test) then t=difference in means/SEM.
The SD is a measure of observation variability and is greater than the standard error of the mean (SEM).
When is the null hypothesis true
What does increasing number patients do
Do lost to follow patient be exlcuded from final analysis
What is startifed rando alloaction
The null hypothesis is true if there are no significant differences in response.
Increasing the number of patients involved in the trial will reduce the baseline differences between the groups.
Patients who withdraw from the study or are lost to follow up may have suffered side effects or even have died from being given the drug, so cannot be excluded.
In a clinical trial of a new drug randomisation attempts to ensure that each patient has an equal chance of being allocated a certain treatment.
Stratified random allocation of treatment is appropriate where the number of patients is relatively small and can be by age, sex, disease duration, etc.
Error
Type
What is type 1
What is type 2
When are they more likely to occur
Type I error or a error is wrongly rejecting the null hypothesis, for example, interpretation of p<0.05 as being significant when it is not.
Type II error or b error is accepting the null hypothesis when it is invalid, for example. when two treatments are compared and no significant difference (that is, p>0.05) is noted, assuming there is no difference between them when there is in fact a difference.
Type II error is more likely to occur when small samples are used.
Type I error is more likely to occur when multiple t tests are performed.
Type II error increases with increasing variability of response to treatment (increasing standard deviation).
Confidence interval provides an interval estimate of the population parameter (usually the mean or mean difference between two groups).
Narrow 95% confidence interval means that the estimate is more precise. A narrow confidence interval indicates lower variability (SD) or higher statistical power or higher sample size which makes type II error less likely.
Whether confidence interval is used will not affect type II error rate.
When cn 95% CI be used
What is it
What does it mean when the CI crosses 0
How can it be calculated
Ninety five per cent confidence intervals can be used for both distributional and distribution-free data.
A 95% confidence interval looks at the range of values within which we are 95% confident that the true population parameter lies.
Therefore, using the above definition, if we were to repeat the experiment many times, the interval would contain the true population mean on 95% of occasions.
Confidence intervals increase the accuracy when comparing means with another population by looking at the spread of differences.
A wide confidence interval indicates that the estimate is imprecise and if the 95% CI crosses zero, if the 95% CI crosses 0, it may indicate that the treatment has no effect, it may be that the study is underpowered and not able to detect a difference (the difference may still exist)
Can be calculated at ± 1.96 times the standard error of the mean
VRS
are they suitable for parametric test?
When it is confiend to 3 levels how can the data be summarrised
When its divided into several tables
how can it be assessed
VAS yield what type of data
how can it be assessed
Verbal rating scales (VRS) and numerical rating scales (NRS) generate discontinuous data that are unsuitable for parametric tests of statistical significance and thus non-parametric techniques must be used.
When the VRS is confined to only three levels, data can be summarised in contingency tables and either the χ2 test or exact tests used.
Where VRS is divided into several levels or NRS used, the Mann-Whitney test or Wilcoxon rank sum test are appropriate.
Visual analogue scales (VAS) yield continuous data and t tests can be used as long as less than 25% of the data are at extreme ends of the range. If there are doubts about the validity of a t test, non-parametric tests can be used.
VAS data may be analysed using standard deviation and standard error. Some authors have used nonparametric tests considering the ordinal nature of the data.
A time series of numerical rating scores are best analysed using some form of analysis of variance for repeated measures or even area under the curve. Measuring the area under the curve gives a summary measure for each patient that can be analysed by a single test.
The Mann-Whitney test only compares two sets of data and cannot be used for multiple testing.
Chi square tests how
Chi square testing refers to count data (categorical). It therefore refers to 2 by 2 tables or larger.
Explanation
These data would be ideal for a chi square test. It is a 2 × 2 contingency table for which there is a special chi squared formula that gives a value that can be looked up in a table giving the p value.
The Student’s t test cannot be used as we are comparing proportions not means.
Pearson’s co-efficient cannot be calculated as there is no linear regression to plot.
Nothing is ever so obvious that no statistical analysis is needed.
Normal distrib
mean mode median
Why
Where does 95% of population lie
What is a parametric test
The mean and median and mode of a normal distribution are equal because the distribution curve of a normal distribution is bell shaped and equal on both sides.
Mu (μ) and sigma (σ) which symbolize the mean and the standard deviation respectively of a probability distribution.
The probability that a normally distributed random variable x, lies between (μ − 1.96 σ) and (μ + 1.96 σ) is 0.95.
The probability that a normally distributed random variable x, lies between (μ − σ) and (μ + σ) is 0.68.
Ninety five per cent of the distribution of sample means lie within 1.96 standard deviations of the population mean.
A parametric test is a statistical test which assumes the data are normally distributed.
Meta analysis
RCT or not
Why do it
Other reasons
Meta-analyses of randomised, controlled trials are usually performed when individually the trials are too small to give reliable answers.
There are a number of reasons for performing meta-analysis which include:
To examine variability between trials
To perform subgroup analysis
To identify the need for major trials, and
To obtain a more stable estimate of the effect of treatment.
Only randomised, controlled trials should be included in such analysis, but if only published studies (which tend to be positive) are used this will introduce bias. If unpublished but properly controlled studies are available they should be used in the analysis.
It is important that patient selection and outcomes are comparable in the studies.
Incidence is
Incidence is the number of new cases of a disease in a defined time period or population.
The number of cases of a disease in a population over a defined time period describes the prevalence of a disease - it is not the number of “new cases”.
The number of new cases of a disease does not stipulate a defined period of time or place (that is, there is no denominator from which to derive an incidence).
The number of new cases of a disease seeking medical treatment describes the incidence of patients seeking medical treatment but not the incidence of the disease in a population; there will be some patients not seeking treatment who have the disease.
The number of patients dying from a disease in a population describes the death rate from a disease.
SD is a measure of what
How do we calculate it
What is standard error
SD>SEM
The standard deviation (SD) is a measure of the scatter of observations about the mean and is a valid statistical parameter for observations that have a normal distribution.
The SD of a group is the square root of the variance (not square).
Standard error is the standard deviation divided by square root of n, and so the SD is numerically greater than the standard error of the mean.
Chi square compares proportions.
What is the mode
what is the median
If Normal distributed what can we say about mean mode and median
The mode is the value that occurs most frequently.
The median is that point on the scale of measurement above which exactly half the values lie and below which lie the other half.
Having a normal distribution, the arithmetric mean, the mode and the median are equal.
In a normally distributed variable, the probability of attaining a value higher than two standard deviations above the mean is approximately 1 in 40 (p = approx. 0.025). This is one sided (higher) - higher OR lower than two standard deviations would be 1 in 20.
In a normal distribution, approximately 95% of the values will lie within the range between (mean + 2 standard deviations) and (mean - 2 standard deviations).
Genetic polymorphisms
Fast and slow whaat
metabolism
consequnece?
important if cleared renally
What can be affected
Some drugs are metabolised by enzymes susceptible to polymorphisms that affect their activity. This is the basis of fast and slow acetylation (e.g. hydralazine, procainamide, sulphonamides, and dapsone) and slow or poor metabolism (e.g. debrisoquine). The prevalence of these polymorphisms shows considerable variation between racial groups.
The consequences of poor metabolism of a particular drug are clearly dependent on its pharmacological actions: drugs with a steep dose-response curve or a low therapeutic index may well produce toxic effects in poor metabolisers.
Genetic polymorphisms are determined by abnormalities of gene expression and are not dependent on the pharmacological actions of the drug.
A number of commonly used drugs are broken down by phase I hepatic metabolism with the same enzyme, cytochrome P2D6 (CYP2D6) for example the beta‐blockers metoprolol, and alprenolol; propafenone; codeine and tramadol; antipsychotics such as droperidol, thioridazine, and haloperidol; and ondansetron and tropisetron.
Gene mutations that control the expression of CYP2D6 can result in:
Complete deletion of the CYP2D6 gene
Replacement of a single nucleotide leading to aberrant gene splicing
The following enzymes are genetically expressed in the kidneys and are therefore, theoretically subject to genetic polymorphism. Because most biotransformation occurs in the liver, the contribution of the kidney is relatively small and not clinically important.
UGT1A6 (paracetamol) UGT1A9 (propofol, furosemide UGT2B7 (NSAIDs, morphine codeine) CY2B6 (ketamine, propofol) CYP3A5 (midazolam)
Whats null hypothesis
Whats the alternative to it
Type 1 errors occur when
Type 2 occur
What is the significance level set at
The null hypothesis is that there is no significant difference between two groups or specified populations.
The alternative hypothesis is that there is a difference (i.e. contrary to the Null hypothesis)
A type I error occurs when we reject the null hypothesis when we should have retained it.
A type II error occurs when we fail to reject the null hypothesis. In other words, we believe that there isn’t a genuine effect when actually there is one.
Rejection of the null hypothesis depends on the probability.
The significance level is usually set at p <0.05
is a perfect corrleation stat signif
whats a signif p value
When T is what why may it be singif
what does it depend on
What lvel of Chi2 is signif
A perfect correlation is when r is either −1 or +1, but this may not be statistically significant.
The significant p value is <0.05 (not 0.5).
When t is >1.96 it may be significant but it depends on the degrees of freedom.
Chi2 must be ≥3.84 to reach conventional level of significance (p <0.05). If degrees of freedom is >1, chi2 needs to be even higher to be statistically significant.
What is the geometric mean
is it greater than aritmetic mean
The geometric mean is the nth root of the product of (a1 … aN) and the arithmetic mean is (a1+ …+aN)/N hence the geometric mean will always be less than (or at most equal if all values are equal) the arithmetic mean.
central tendency or typical value of a set of numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum).
How calc SEM
SD fall w. increase sample size?
SEM decrease or increase w. increase sample size
Does skewness depend on SD >/< mean
What is students T test
The standard error of the mean (SEM) = SD/√n.
SD does not necessarily fall with sample size as the distribution of values may increase and hence SD increase.
SEM would decrease with sample size as can be seen in the above calculation.
Skewness does not depend on whether SD is greater than or less than the mean.
Student’s t test is a parametric test comparing normally distributed data.
What is prevelance?
How is it expressed
can it be calc from cross sectional study
Prevalence depends on the number of individuals who contract the disease in a particular time period.
Because it looks at the number of individuals with a disease at a given point in time, or within a defined interval, if a patient has recovered from the illness in that duration, then they would not be included in the prevalence rate.
It is expressed as a proportion. As cross-sectional studies are effectively a snap-shot they can be used to estimate the proportion of people with a disease at that time and thus the point prevalence.
Prevalence is one measure that can assess the health needs of a community.
P = I × D
Where:
P = Prevalence I = Incidence D= Duration
What % indivdual lie beyond 2 SD of mean if normal distrib
Only about 5% percent of individuals will be beyond two standard deviations from the mean (not 10%).
Sample size infleunces what
Sample size influences level of significance through its use in the calculation of SD/SE.
It does not affect
The level of acceptance
The alternative hypothesis with a general level set at p<0.05
The test to be used.
The properties of a normal distribution are
How many lie within 1 SD How many lie 2 SD how many lie 3sd
how many observation lie between X and X 1sd
Do data from normal distrib require transformation
How may observations that are not normally distrib be converetd to normal distrib
What may be sutiable for transformation by taking square root value
How is 95% CI calculated
for pop >30
<30
Symmetrical about the mean so that the mean, median and mode coincide
Sixty eight percent of observations lie within 1SD (s) of the mean m: x ± 1SD, 95% lie between x ± 2SD, 99.7% lie between x ± 3SD
Because of this symmetry, about 34% of observations lie between x and x + 1SD.
Data from a normal distribution are suitable for parametric tests without prior transformation.
Observations which do not conform to a normal distribution may be log-normally distributed and can be transformed to a normal distribution by converting values to log10.
Counts of events (for example, bacterial colonies, radioactive counts) may follow a Poisson distribution and may be suitably transformed by taking the square root value.
The 95% confidence interval gives information about the range of values within which the true population is likely to lie.
The mean 95% confidence interval is calculated as the mean 1.96 times the standard errors of the mean (sem) for populations of greater than 30.
For smaller populations the appropriate value of t can be taken from appropriate tables such that the 95% confidence levels are calculated as x-(t ± sem) - x + (t ± SE), where t is taken for the appropriate degrees of freedom associated with a confidence of 95% 100(1-a)%, that is, 0.05.
Levels of evidence
Level 1 - High-quality randomised controlled trial with statistically significant difference or no statistically significant difference but narrow confidence intervals (prospective controlled)
a - systematic rev/meta anly mult rct
b - 1 rct
Level 2 - Prospective comparative study (prospective uncontrolled)
a - 1 well desing contoll non rand
b 1 desing expeiment - cohort
Level 3 - Case-control study, retrospective comparative study (retrospective controlled)
Level 4 - Case series (retrospective uncontrolled)
Level 5 - Expert opinion.
What is the power of a study
The power of a study is the probability of rejecting the null hypothesis when it is false, that is, the probability of concluding a result is statistically significant.
Evidence based medicine is what
Is it involved in rationing resosurce
Sackett et al. state that “Evidence based medicine is the conscientious, explicit and judicious use of current best evidence in making decisions about care of individual patients. This means integrating individual clinical expertise with the best available external evidence”.
Clinical expertise involves proficiency and judgment gained by clinicians with time and the compassionate application of knowledge to individuals.
Current best evidence comes from many sources including:
randomised controlled trials meta-analysis national expert guidelines (for example, hypertension and asthma) patient-orientated studies, and health economic assessment.
Evidence based medicine is not ‘cook-book’ medicine, a method for cost cutting and does not solely rely on randomised controlled trials.
Ordinal
Ranked
Subject mutually exclusive group
intrinsic ranking
not part of a scale
ASA grade
GCS grade
Nominal data
Not ranked
bear no numerical relationship
sex/blood group
Numerical data
Discrete or continous
Discrete
Take certain value
number of sister
Continous
take any value
height
weight
BP
Variability =
Square root of SD
Box and whisper plots
SInterpreting distribution of data
Line conec highest and lowest - range
Perpendic - rep median
box represents percentiles (2.5-97.5)
outlier - asterisk
SEM
standard error mean
what branch stats
what does it say
WHhat does it equal
whats the formula
Relationship with increasing sample size
how can it calc Conf inter for true pop
Inferential stastistics
1 inferenace - several samples pop - means plotted - they will for a normal distrib around true pop mean
SEM = SD of sample means
Calc formla = SD of sample /
square root of sample number
Sample size increased -= SEM decreases
ie sample mean more closely relate to true pop mean
Sample mean +- 1.96x SEM
NNT is reciprocal of
met anal can they be easily biased?
absolute risk reduction
meta anaylsis can be easily biased
Beta error = type 2 error
rejection resul - chance when real differnce
20% max acceptable
Statisitcal technqieues no asumption distribution
Parametric -assum3
type 1 error
type 2 error more or less common
non parametric
used on any type data
Parametric - assume normally distributed
provided data contin vary w/ dev not to extreme can be used for non norm distub
null hypoth reject - no diff
more common
accept null hypoth ie not ident diff between group when it exists
smal sample size