Statistics Flashcards
Statistic of central tendency for nominal data
Mode
Yes or no data
Nominal
Standard error of mean from std dev
Std error of mean = standard deviation / sq root of sample size
Categorical variable for nominal
Fishers exact
Odds ratio interpretation for OR of 1.18 (ci 95% 1.04,1.33)
Risk of event elevated by 4% to 33% and statistically sunificat (OR doesn’t include 1)
Type of trial you use odds ratio to measure significance
Case control , sometimes cross sectional or cohort with some modifications
Calculate odds ratio
A/c divided by b/d = ad/bc
Where a = exposed cases B = exposed non cases C = unexposed cases D = unexposed non cases
Continuous data
Data along an infinite or finite continuum that can be broken down into an jndinite degree of detail - weight, temperature, etc)
When would you use kruskal wallis test?
Non parametric and ordinal data
Panns represents which type of data
Continuous, even though made up of multiple ordinal scales
Central tendency stat for ordinal data (ranked in order)
Median (mean not appropriate since data are categorical and not to be treated as continuous)
Types of continuous variables with examples
Interval and ratio
Interval - eg temperature degrees Celsius - equal intervals and zero is arbitrary
Ratio - like interval but there is a true zero - ex: weight, blood pressure
Test to see of data normally distributed
Kolmogorov-smirnov
2 discrete probability distributions
Binomial - only two different outcomes like heads or tails
Poisson - another probability distribution when you count a number of events across times - ex: number of ADRs from drug x over a time
Kurtosis
How flat a distribution is - normal distribution = 3
Skewness - symmetry of distribution - is data clustered at low end positively or negatively skewed?
Low end - positively skewed - outliers on the high end pull mean in higher direction so mean is higher than median
High end - negatively skewed - low numbers pull mean down so mean is lower than median
Standard error of the mean
Different than sd- doesnt tell you how values compare to mean, tells you how this samples mean compared to othersAmples from same population
- for more than 1 sample studies
- is sd/sq root of n
Non parametric test criteria
Non normally distrib data
Eg nominal or ordinal variables with sample size under 30
Also, scales - ordinal - with less than 12 categories eg panss
Defn of beta
Probability of making a type II error
Usually < 0.2, pref < 0.10
Defb of alpha
Prob of type I error
Inversely related to beta
Continuous variable parametric test
- compare two means?
If independent samples - t test (student )
Paired or matched data - paired t test
Comparison of 3 or more groups
One way anova - helps avoid type I error
- performs multiple t tests
Anova detects what?
A difference among the 3 of more groups
- then, a multiple comparison method must be employed to detect which difference
- dunnet, bonfsrroni, tukey, etc
- repeated measures anova - subjects in these are paired and serve as own control (participate in >1 treatment group)
Nominal variables (nonparametric tests)
Chi square test
- ex: test diff of baseline characteristics sex, smoking status, alcohol, yes/no variables like this
- tests observed vs expected frequencies
- must be larger samples
Nominal variables (non parametric) besides chi square
Fishers exact - when sample is <20 or expected 2x2 cells is less than 5
Mcnemar - similar to chi sq but for paired or matched data
Mantel-haenszel - to see if one factor is influencing the results - uses separate contingent tables
Ordinal data (non parametric test) for 2 groups
Mann Whitney- non para equiv to student t
– no paired groups
Sign test - matched or paired data - tells whether pos or neg difference
Wilcoxon signed rank test
- determines magnitude of diff and rank order of differences
Ordinal non parametric test with 3 or more groups
Kruskal wallis one way Anova
- data not matche or paired
Friedman two way anova
- data are paired or matched
Correlation - which test is for parametric and which is for ordinal
Pearson corr coeff- for parametric , ranges from -1 to 0 to 1
Spearman rank corr coeff - ranks the strength of correlation
Regression - when to use logistic vs linear
Linear regression - continuous variables (parametric)
Logistic regression - ordinal or nominal data - non parametric
Survival analysis notes
Censoring - takes into acct that some subjects leave study for different reasons and can enter study at different time points
Actuarial method - counts number subjects who reach a certain point
- ex- pt who dies at 5 months 29 days isn’t included in the 6 month analysis
Kaplan Meier - measures time to endpoint
- produces life table and survival survey
Cox hazards proportional - allows researcher to adjust for differences in study groups (age, comorbidities)
- produces hazard ratio and CI
Incidence
Number of new cases that occur in a popn in a specified time (number of new cases can trend over time)
Prevalence
Number of cases in the population who HAVE disease in a specific time frame
2x2 table
Dz + Dz-
Rf + A B A+B
Rf- C D C+D
Relative risk
Actual or true risk
Used in prospective and ecperdnral studies
RR = (A/A+B) / (C/C+D)
Ex: prospective cohort study to evaluate subj taking antipsychotics and development of dm - take subj with and without antipsychotic use and calculate RR to see if dm associated with antipsychotic use
Odds ratio
Estimates Relative risk
Used in case control and cross sectional studies
OR =( A/C) / (B/D) = AD/BC
Study subjects are selected on basis of disease status so it is not possible to calculate te rate of development of the disease given presence or absence of exposure - thus, OR used to approximate RR or estimate risk
OR and RR interpretation
similarities
Both used to determine magnitude of association between exposure to risk factor and disease
Same scale - >1 means correlates with association with development of dz, < 1 means protection, and =1 means no association
If 95%CI includes 1 => not stat sig
Relative risk reduction
Estimates % of risk that is reduced by result of the intervention
= 1-RR
OVERestimates true risk because divided by proportion of control group outcome rate. So often the benefit of a treatment is Overstated!
Absolute risk reduction
Rate in intervention group minus rate in control group
2x2 table for diagnostic test accuracy
Dz + Dz -
Test + TP FP
Test - FB TN
Sensitivity
Probability that a true pos test occurs in an individual pos for dz
Sensitivity = TP/(TP+FN) *100
= true pos / people who have disease
Highly sensitive test rules out dz SNOUT
Specificity
Probability that a true negative test result occurs - neg test in neg pt
Specificity = TN/(FP+TN)
= people who test negative / people without the disease
Spin - highly specific test rules in (confirm) disease
Positive predictive value
PPV = TP/(TP+FP) *100
PPV = proportion of individuals who have diseas when test is positive = likelihood a person with pos text has disease
Negative predictive value
NPV = TN/((FN + TN)
Proportion of disease free persons who test negative - likelihood that a person with negative test doesn’t have disease
Case control
Analytical observational study
- retrospective -
Use in new diseases or outbreaks
Measure of association = odds ratio
Cohort study
Analytical observational study
Strongest observational study design
Usually prospective
- relative risk is measure of association
Cross sectional
Aka prevalence study
Descriptive obs study
- user to gather info on risk factors and outcomes of interest
- generate hypotheses
Case report / case series
Descriptive obs study
Generate a case dfn
Determine adv effects , generate new info
Effect size
Effect size = d = cohens d
(mean experimental grp - mean control group) / st dev
Interpretation - tells us how many st dev of difference between exp and control. Eg if d=0.25, means that there is a quarter sd difference
- 2 = small effect size
- 5 medium
- 8 large
Analytic studies vs descriptive
Analytic - case control and cohort - involve more comprehensive data
Descriptive - cross sectional an case report/series - compare disease frequency in populations, generate hypotheses
Internal validity
Does the study measure what it was designed to measure?
Does it address biases, confounders?
** if you do not have internal validity, you won’t have external validity
External validity
Assumes internal validity (measures what intended, addresses biases confounders and outcomes)
- external validity means outcomes can be generalized to other groups or patients, including your clinic population
Selection bias
Selection of study participants
- includes sampling bias - researcher chooses study participants based on convenience rather than representativenes
Detection bias- individuals who have risk factors - leads to more medical encounters - increase probability dz is identified
Admission rate bias (Berkson’s)
- specific to using case and controls inpatients - exposure and disease being studied leads to higher exposure rate among hospital cases than controls. Example: OCuse lead to DVT- higher referral rate to hospitals
Response bias - individuals who participate are different than those who decline to participate
How to minimize selection bias
- Define study cases in a detailed and objective manner
- enroll a representative study sample in the study
Information bias
In accuracy in collecting data
Recall bias- different memory of past events
- people w disease recall more detail than healthy people - case control and retrospective cohort are most vulnerable
Interviewer bias - differences in obtaining info from subjects
How to minimize recall bias
- Confirm pt response through medical records
- use a control group w disease other than that being studied
How to minimize interviewer bias (type of information bias)
Detailed training of interviewers
Directions to study staff conducting interviews and surveys
Supervision of data collection process
Follow up or attrition bias
- study participants lost to follow op
- prospective study most vulnerable
- difficult to minimize but assess reasons for loss
Misclassification bias
Inaccuracy in measurement or placement of study participants
- mismeasurement, or if someone was thought to have disease on study entrance but does not
Sources of miss classification bias:
- variation among study observers and instruments
- variation in underlying characteristics
- misunderstanding of questions by study subjects (interview or questionnaire)
- incomplete medical record data
Compliance or adherence bias
One treatment that pts adhere to better than another
How to address bias
Proper study design
Conduct of study - selection of pts, procedures, supervision and training
statistical analysis:
- difficult to accomplish because no stat test can correct for bias or fix study flaw
- using appropriate stat procedures for data analysis can help with bias
Confounding variables
- falsely conclude that a rf is associated with a disease without adjusting for rf that are either known or unknown
1) confounders can influence study results have the potential to influence study results
2) researchers may not account for these, or even be aware of their existence!
Controlling for confounders
1) randomization - ensures confounders are evenly distributed
- not done in epidemiology studies like case control, retro, and cohort
2) restriction -
Restric admission to study to certain category of confounders
- matching - equal representation of subjects with certain confounders among study groups
- over matching - strong association between variable and variable of interest that decreases ability to find a result. Do not match based on factors affected by disease or exposure eg signs and sx because this decreases ability to find a result
3) analysis - stratification - data are split into non-overlapping groups called strata where a specific factor is contained in separate strata to see if each may contribute to effect
- multivariate regression analysis - can control for a number of confounders at same time without losing power
-
Criteria to establish causality and not just association or relationship
Strength of association Reproducibility - different populations different times Temporal sequence - has to happen before Biological plausibility Dose response relationship - can be, but not necessarily Coherence of relationship
- strength of association
A) stronger the association, the less likely it is due to chance alone
B) but, just because the magnitude is low doesn’t mean there is no cause and effect
Study design strength from strongest to weakness re what can be concluded for results and causality
RCT- strongest design for cause effect and differences in tx effect Cohort Case control Case series Case report - weakest causality
Observational study - appropriate?
Case control, cohort, cross sectional
- appropriate for studying natural history of disease, accuracy of dx test, or public health policy - program planning etc
Hypothesis evaluation
- is it an answerable question?
- sufficiently narrow and objective?
- use SMART criteria
- biological, temporal, and time frame plausibility
Subjective vs objective outcomes - what is PANSS?
It measures subjective - psych sx, but validation and standardization minimizes variability
Primary vs secondary data sources
Primary -‘measured directly by researcher for purpose of ongoing study - rct, cohort, case control, cross sectional
Secondary- from databases or pt medical records. Data is already collected, researcher gains permission to access for study (retrospective cohort, case control and cross sectional)
- advantage: not as costly and doesn’t take as much time to acquire data
- disadvantage: missing data can impact accuracy of results and data may be miscoded eg ICD codes done wrong
Data analysis and interpretation
Obs studies
Case control study - Odds ratio
Cohort study - relative risk w CI
Post hoc analyses - what is it good for
Generating hypotheses
Quality of evidence
US preventive service task force
level 1: evidence obtained from at least one well designed RCT
level 2-1: well designed controlled trials without randomization
Level 2-2: well designed cohort or case control trial, pref from >1 center
Level 2-3: evidence obtained with multiple time series with or without intervention
Level 3: opinion of experts, case reports or series
Efficacy vs effectiveness
Efficacy is narrow term used to describe outcomes in studies
Effectiveness is broad and defines a real world outcome
Survival analysis - psych trial issues
- usual presented graphically, without confidence intervals
- doesnt explain the impact of drop outs on power in studies
Generalizibiluty of studies
One factor:
- would exclusion criteria for the study exclude pts in our practice?
Relative risk
Incidence in group a divided by incidence in group b
Odds ratio function
Estimates relative risk in retrospective studies
When does odds ratio overestimate risk?
When the incidence is >10% the odds ratio overestimates risk - over 10 over estates
When incidence is s a decent estimate
Odds ratio calc
Exposed cases / unexposed cases
Divided by
Unexposed cases/ unexposed non-cases
Relative risk calculation
For prospective study RR= A/(a+b) Divided by C/(c+d)
Observer bias
Minimized by blinding esp double blinding
Allocation bias
Can occur in experimental studies when patients are randomized
Information bias
Occurs in observational studies where must rely on existing sources of information
Misclassification bias
Occurs when inaccuracies or measurement or placement of study patients in specific groups. Most vulnerable: case control and retrospective studies
Ordinal test equivalent to student t test
Mann Whitney U test
Used she a comparison is being made with 2 non paired groups which don’t have to be equal size - non paired means subjects don’t have to participate in all treatment (don’t have to serve as own controls)
Two different tests for comparing nonparametric nominal data
Chi square for large sample Chicago large
Fishers exact for sample size less than 20