Biostatistical Concepts Flashcards
Data Exists in What 2 Categories
Numerical Variables
(counts, measurements)
Categorical variables (classifying, ordinal data)
How can data be summarized
tables, graphs, pie charts, component band charts, spot maps, rate maps, line graphs, frequency distributions and histograms, normal distributions
The Mean
Most prominent
The sample average
The Median
The middle after all the measurements have been put in order according to their value
the Mode
Value of the measurement in the sample that occurs most frequently
Measures of Validity
Variance, standard deviation, standard error.
Random Samples
Process of random sampling is essential to statistical interference. Equal chance of being selected for sample.
Confidence Intervals
In order to construct a lower bound and a upper bound are calculated. To create a reasonable bound for the population mean based on info from a sample.
The P Value
represents the likelihood that a value for the mean of a random sample from this population
Statistical Power
The higher the power the better, power is affected by the sample size and the variance of the individual observation
T-tests
Uses a statistic that under the null hypothesis tests whether these two means differ significantly
Correlation
Quantifies the degree to which two variables vary together. If variables are independent, then no relationship.
Chi-squared tests
Tools for displaying numbers of participants according to two or more factors or variables. Close examination of the table leas to the inevitable question of whether or not there is evidence of an association between exposure and disease
Regression
Regression models are vital tools for data analysis and are used extensively:
Linear regression
Logistic Regression
Cox proportional hazards regression
Linear regression
The dependent variable needs to be a continuous variable with its frequency distribution being the normal distribution
Logistic regression
The dependent variable is derived from the presence or absence of characteristics typically represented by 0 or 1
Cox proportional hazards regression
The dependent variable represents the time from a baseline of some type to the occurrence of an event of interest.
Kaplan Meier survival curves
i. Display any type of time-to-event data
ii. Proportions range from 1.0 at the outset don to 0.0.
Plot curves with survival time on horizontal axis rather than calendar time
The two-sample t test
Specify population variance, the values from normal distribution, and the difference we want to detect.
The test comparing two proportions
Population portions must be specified, prudent to complete calculation several times.
Meta-analysis
Statistical synthesis of the data from separate but similar studies leading to a quantifiable summary of the pooled results to identify the overall trend.
steps for Meta-analysis
1) Formulate the problem and study design
2) Identify relevant studies
3) Excluding poorly conducted studies or those with major methodological flaws
Measuring, combining, and interpreting results