Statistics Flashcards
What are the type of variables
Nominal
Ordinal
Interval
Ratio
It is a type of variable with no value
Nominal (name)- gender, blood type
It is a variable with order/superiority but no magnitude difference
Ordinal (order)- stage of NEC
It is a variable with equal interval but no zero
Interval- body temp
It is a variable with equal interval with meaningful zero
Ratio
Tests for normal distribution
parametric test
- T-test
- ANOVA
- Pearsons correlation
Can only be used on interval and ratio
Nominal data with only 2 groups
dichotomous or binary
it is a variable that is the outcome
Dependent variable
it is the variable that is the intervention
independent variable
Tests for skewed distribution, ordinal and nominal variable
non-parameteric
Wilcoxon rank sum test
kruskal-wallis test
Spearmans rank correlation coefficient
measure of central tendency that is the sum of all observation
mean “average”
- can be influenced by outlying value
measure of central tendency that is the middle value of date
Median
- more appropriate for skewed data
- commonly used for ordinal data like apgar score
measure of central tendency that is most frequently occuring
Mode
- commonly used with nominal data
It is the bell shaped frequency distribution where the mean, median, mode are the same
Gaussian distribution/ Normal distribution
It is how flat or peaked the curve
Kurtosis
- peaked >0 leptokurtic
- normal 0 mesokuritc
- lower and broad <0 platykuritc
Note: all are symmetric
In skewed data, basis of terminology
It based on the tail
- right- positive
- Left- negative
If mean> median, right
If mean<median, left
It is the measure of dispersion which is the difference between the highest and lowest value
Range
- dependent on sample size
- influenced by extreme values
It is the measure of dispersion which is the difference between the median of the lower half and upper half of the data
Interquartile range
- between 25th and 75th percentile
- less influenced by extreme value
- comprises middle 50% of the data
It is the deviation from the mean
Variance
It is the square root of variance, how close a cluster is to the close to the sample mean
Standard deviation
Meaning of standard deviation if mean is known and has normal distribution
1 SD- 68.2% (34.1 %- left or right)
2 SD- 95.4% (47.7%)
3 SD- 99.8% (48.9%)
- know this- can compute for percentage of the sample is included
It is the SD of the error of the sample mean in relation to the true mean of the total population
Standard error of the mean
- how close is the sample mean close to the population mean
- inc the sample size, SEM decreases
it is a hypothesis with one predictor and one outcome
simple hypothesis
It is a hypothesis with several predictor variable
complex hypothesis
It is the hypothesis that proposes no difference between groups
null hypothesis
It is the hypothesis that proposes an association
alternative hypothesis
it is a parametric test to compare 2 groups that are continuous, normal distributed
T-test
- Paired: subject his own control (before and after)
- Unpaired: two groups compared
Extension of T-test with three or more groups
Analysis of variance (ANOVA)
It is the comparison for further exploration of data after significant effect
Post hoc comparison
Test used for ordinal data
Wilcoxon rank
mann-whitney U
Test for categorial data
Chi-squared test
Fisher exact test
Type of error that rejects the null hypothesis when it is true
Type I false positive
Reduced by more stringent P (influenced by: sample size, difference of control and expe, less variance)
Type of error that fails to reject the null hypothesis
Type II false negative
Reduced by increasing sample size, power of the study
It is the probability the null hypothesis is true by chance
P-value
P value 0.05 means 5% chance the null hypothesis is true to chance alone (5% na swerte lang) or 95% the sample represents different population (the groups are different, talagang magkaiba)
true or false lower p-value has a higher strength of association or importance of association
False
Remember p-value the null hypothesis is true by chance alone
What is bonferroni correction
Interpreting the P value when multiple comparison- need to be more stringent due to higher likelihood of type I error
- p value/ number of comparison
It is the range of values you expect the actual mean of the true population
Confidence interval
It is the probability of including the population mean within the confidence interval
Level of confidence
A high level of confidence will widen the range (the lower the confidence, the narrower the range)
Absolute risk
the number of subjects who develop the disease among the exposed
- a/a+b
Absolute risk reduction
the absolute effect of the exposure
% outcome of exposure- % outcome from non exposure
- a/(a+b)- c/(c+d)
Also known as risk difference or attributable risk
Number needed to treat
the reciprocal of absolute risk reduction
- 1/(c/(c+d))-((a/a+b))
As the difference in group increases, the lower NNT
Relative risk reduction
(control event rate- experiment event rate)/ control rate
- (c/c+d)- (a/a+b)/ (c/c+d)
If its negative: protective
If its positive: harmful
It is the probability of the outcome in the exposed vs in unexposed
Relative risk or risk ratio
- (a/a+b) / (c/c+d)
Interpretation:
> 1 positive association
< 1 negative association
Its the improvement in outcomes simply as a result of being involved in astudy
Hawthorne effect
How does randomization reduce bias
By creating two groups of individuals that have equal likelihood of having the outcome of interest
It is an observational study useful for rare diseases
Case control
the probability of rejecting the null hypothesis when the alternative hypothesisis true
Statistical power
(1-Type II error rate)
It is the probability of rejecting the null hypothesis when it is true
Type I error
It is the probability of accepting the null hypothesis when the alternative is true
Type II error
Statistical power depends on
- increasing the significance criteria (p value)
- increasing the magnitude of effect (difference or change)
- increasing the sample size
Deaths that occur between 22 weeks’ gestation and 7 days of postnatal life
Perinatal mortality
deaths in the first 28 days of life
Neonatal mortality
Death occuring within the 1st year of life
Infant Mortality
It shows the difference in the rate of a condition between individuals with and without a specific exposure
Attributable risk
Best study or research design: Cohort study
Of the measure of central tendency, which should be equal to show normal distrubution
Mean and median
Mean> median- skewed to the right
Mean<median skwed to the left
Cummulative incidence = Incidence rate
period of observation is short
disease prevelence is low
duration of disease is same for the exposed and non exposed
Effect of prevalence in predictive value of a test
- increased: increase PPV, dec NPV
- decreased: dec PPV, inc NPV
It the ability of a test to correctly identify individuals who have a condition
sensitivity
It is the ability of a test to correctly identify individuals who do not have a condition
specificity
It presents as the proportion of individuals with a positive test result who have a condition
Positive predictive value
- affected by disease prevalence; when a disease is more prevalent, the PPV is higher.
It presents the proportion of individuals with a negative test result who do not have the condition
Negative predictive value
- affected by disease prevalence; when a disease is more prevalent, the NPV is lower.
It is used assess a diagnostic test’s accuracy whose results are continuous variables.
Receiver operating characteristic (ROC) curves
The best clinical design for evaluating intervention for clinical practice
randominzed controlled trial
what decreases bias in RCT
- randomization
- blinding
It is the statistical method in a systematic review, data from individual studies combined to give a summary of the effectiveness of an intervention with a 95% confidence interval
terms used: estimate relative risk (RR), RR reduction, or odds ratio
Meta-analysis
It increases power and precision of estimates of treatment effects and exposure risks
- It is the variation among studies in a meta analysis
- It is measured using I2 statistic
Heterogenicity
Interpretation of heterogenicity:
0% to 40%: might not be important
30% to 60%: may represent moderate heterogeneity
50% to 90%: may represent substantial heterogeneity
75% to 100%: considerable heterogeneity
what are the definition of mortality
a. fetal death
b. infant death
c. maternal death
a. fetal- death prior to expulsion or extraction of product of human conception regardless of duration of pregnancy
b. live birth- expulsion of product of conception from mom irrespective of durwtion of pregnancy, baby has evidence of life- beating heart, pulsation of the umbilical cord or movement of voluntary muscle
c. death of woman while pregnant or within 42 days of pregnancy
what is a SMART objective
specific
measurable
achievable
relevant
time-bound
What are the type of validity
- internal validity: results of the study are true or are they a result of the way the study was designed or conducted.
- External validity: generalizability of results to other settings or samples.
odd ratio
OR= ad/bc
- odds in exposure/odds in the nonexposed
OR= (a/b)/(c/d)
When does odd ratio and risk ratio almost the same or approximate
Outcome is rare
Statistical tests