Epidemiology & Biostatistics Flashcards
Define Bias
Systematic error in the design, management or analysis of a study that causes a mistaken estimate of the exposure’s effect on the outcome.
Explain design bias
Wrongly chosen sampling strategy or study design.
Explain conduct bias
case enrollment, follow-up or data collection is not carried out properly and has issues.
Explain analysis bias
The chosen statistical methods are wrong, variables could be miscategorized or modelling assumptions can be wrong.
What three types of bias are there?
Selection Bias, Confounders and information bias (also called measurement or misclassification bias)
Define confounding
A confounder is a variable that influences both the dependent variable and independent variable causing a spurious association.
Define sensitivity
The proportion of positives that are correctly identified as such. (Also called the true positive rate, the recall, or probability of detection in some fields)
Define spesificity
The proportion of negatives that are correctly identified as such. (Also called the true negative rate)
What are the four main aspects of infection control?
Surveillance (passive/active), patient contact, hygiene, education/awareness
How does Normal distribution fit with standard deviation?
68% are within 1 SD of mean and 95.5% are in 2 SD’s and 2.3% in each tail .
Range of P-value and the usual significance level
range from 0-1, significance commonly 0,05
What is Type II error?
Thinking there is no difference when there in truth is difference, ie. the failure to reject null hypothesis.
What is Type I error?
Thinking that there is a difference, when in fact there is none, ie. the failure to accept true null hypothesis.
Explain power and what affects it
The ability for a test to find a difference when there really is difference, ie a true positive. Power is high if the outcome difference is large, when significance level is high, sampling variability is low and sample size is large
What can linear regression be used for? How to test for significance of the test?
explore the linear relationship between two continuous random variables with normal distribution and equal variance. Use p-value of the slope for significance and R-squared (between 0-1, percentages) to how well it fit’s the data.
Uses of logistic regression and type of curve, results and significance?
To model the log of the odds of the binary outcome we are interested in as a linear function of one or more predictors, X. Sigmoid curve. Results are either coefficients (beetas) or odds ratios ratios calculated from them (exp(coefficient) and confidence intervals or p-values.
What tests can you use for testing or modeling means of continuous data?
t-test, Anova and linear regression
What should you know to estimate correct sample size,?
Significance level (alpha=0,05, remember type I versus type II trade off - the lover level, the higher sample), Power (1-beta, sensitivity), Variance (ie how precise are your measurements), Effect size (smaller effect, larger sample)
Ten Steps of an Outbreak Investigation
- Determine the existence of the outbreak
- Confirm the diagnosis
- Define a case and count cases
- Orient the data in terms of time, place, and person
- Determine who is at risk of becoming ill
- Develop a hypothesis that explains the exposure that caused disease and test this hypothesis
- Compare the hypothesis with the established facts
- Plan a more systematic study
- Prepare a written report
- Execute control and prevention measures
For example, suppose we want to estimate the survival of premature infants that are born at 25 weeks of gestation and we create a CI that ranges from 64.3% to 89.5%. How can we interpret this value?
“We are 95% sure that this interval 64.3% to 89.5% contains the overall proportion of surviving infants in the population.” Or another way, “We are 95% sure that the true proportion of survival for infants born at 25 weeks of gestation is between 64.3% and 89.5%”
Describe nominal data
No natural order - gender, race, blood type
Describe ordinal data
Natural order - tumor scales, social class
Describe binary data
0-1, disease status, diagnostic test result
Describe categorical data and test to use with it
Either nominal, ordinal or binary. Summarise with counts and proportions, plot with pies and bars, analyse with confidence intervals, chi-square, mcNemar(paired), logistic regression (multiple
What classes fall into quantitative data
Discrete meaning integers and continuous which can take any real number
What methods can you use for discrete count data?
rates for summary, trend plots and histograms for plots and confidence intervals and poisson regression
What methods can you use for continuous data?
Summarise: mean, SD, median, interquartile range.
Plot: histogram, scatter plot, box plot, dot plot
Analyse: confidence interval, T-test, anova, correlation, simple and multiple linear regression
Time-to-Event methods?
Summary: median survival time, five-year survival, hazard ratio (HR)
Plot: Kaplan-meier curves
Analyse: confidence interval of HR, Long-rank test, proportional hazard regression (PHR = cox regression)
Factors that affect estimate precision
variability of outcome, sample size, desired confidence level