Statistics Flashcards
Statistic of central tendency for nominal data
Mode
Yes or no data
Nominal
Standard error of mean from std dev
Std error of mean = standard deviation / sq root of sample size
Categorical variable for nominal
Fishers exact
Odds ratio interpretation for OR of 1.18 (ci 95% 1.04,1.33)
Risk of event elevated by 4% to 33% and statistically sunificat (OR doesn’t include 1)
Type of trial you use odds ratio to measure significance
Case control , sometimes cross sectional or cohort with some modifications
Calculate odds ratio
A/c divided by b/d = ad/bc
Where a = exposed cases B = exposed non cases C = unexposed cases D = unexposed non cases
Continuous data
Data along an infinite or finite continuum that can be broken down into an jndinite degree of detail - weight, temperature, etc)
When would you use kruskal wallis test?
Non parametric and ordinal data
Panns represents which type of data
Continuous, even though made up of multiple ordinal scales
Central tendency stat for ordinal data (ranked in order)
Median (mean not appropriate since data are categorical and not to be treated as continuous)
Types of continuous variables with examples
Interval and ratio
Interval - eg temperature degrees Celsius - equal intervals and zero is arbitrary
Ratio - like interval but there is a true zero - ex: weight, blood pressure
Test to see of data normally distributed
Kolmogorov-smirnov
2 discrete probability distributions
Binomial - only two different outcomes like heads or tails
Poisson - another probability distribution when you count a number of events across times - ex: number of ADRs from drug x over a time
Kurtosis
How flat a distribution is - normal distribution = 3
Skewness - symmetry of distribution - is data clustered at low end positively or negatively skewed?
Low end - positively skewed - outliers on the high end pull mean in higher direction so mean is higher than median
High end - negatively skewed - low numbers pull mean down so mean is lower than median
Standard error of the mean
Different than sd- doesnt tell you how values compare to mean, tells you how this samples mean compared to othersAmples from same population
- for more than 1 sample studies
- is sd/sq root of n
Non parametric test criteria
Non normally distrib data
Eg nominal or ordinal variables with sample size under 30
Also, scales - ordinal - with less than 12 categories eg panss
Defn of beta
Probability of making a type II error
Usually < 0.2, pref < 0.10
Defb of alpha
Prob of type I error
Inversely related to beta
Continuous variable parametric test
- compare two means?
If independent samples - t test (student )
Paired or matched data - paired t test
Comparison of 3 or more groups
One way anova - helps avoid type I error
- performs multiple t tests
Anova detects what?
A difference among the 3 of more groups
- then, a multiple comparison method must be employed to detect which difference
- dunnet, bonfsrroni, tukey, etc
- repeated measures anova - subjects in these are paired and serve as own control (participate in >1 treatment group)
Nominal variables (nonparametric tests)
Chi square test
- ex: test diff of baseline characteristics sex, smoking status, alcohol, yes/no variables like this
- tests observed vs expected frequencies
- must be larger samples
Nominal variables (non parametric) besides chi square
Fishers exact - when sample is <20 or expected 2x2 cells is less than 5
Mcnemar - similar to chi sq but for paired or matched data
Mantel-haenszel - to see if one factor is influencing the results - uses separate contingent tables
Ordinal data (non parametric test) for 2 groups
Mann Whitney- non para equiv to student t
– no paired groups
Sign test - matched or paired data - tells whether pos or neg difference
Wilcoxon signed rank test
- determines magnitude of diff and rank order of differences
Ordinal non parametric test with 3 or more groups
Kruskal wallis one way Anova
- data not matche or paired
Friedman two way anova
- data are paired or matched
Correlation - which test is for parametric and which is for ordinal
Pearson corr coeff- for parametric , ranges from -1 to 0 to 1
Spearman rank corr coeff - ranks the strength of correlation
Regression - when to use logistic vs linear
Linear regression - continuous variables (parametric)
Logistic regression - ordinal or nominal data - non parametric
Survival analysis notes
Censoring - takes into acct that some subjects leave study for different reasons and can enter study at different time points
Actuarial method - counts number subjects who reach a certain point
- ex- pt who dies at 5 months 29 days isn’t included in the 6 month analysis
Kaplan Meier - measures time to endpoint
- produces life table and survival survey
Cox hazards proportional - allows researcher to adjust for differences in study groups (age, comorbidities)
- produces hazard ratio and CI
Incidence
Number of new cases that occur in a popn in a specified time (number of new cases can trend over time)
Prevalence
Number of cases in the population who HAVE disease in a specific time frame
2x2 table
Dz + Dz-
Rf + A B A+B
Rf- C D C+D
Relative risk
Actual or true risk
Used in prospective and ecperdnral studies
RR = (A/A+B) / (C/C+D)
Ex: prospective cohort study to evaluate subj taking antipsychotics and development of dm - take subj with and without antipsychotic use and calculate RR to see if dm associated with antipsychotic use