2 - Biostatistics 1 - Basic Principles Flashcards
Statistics
Encompasses methods of collecting, summarizing, analyzing & drawing conclusions from data
Biostatistics
The application of statistics to medical, biological and public health data
Descriptive Statistics
A means of organizing and summarizing observations
Statistical Inference
A process of drawing conclusions about a population from a sample
Population
A collection of all subjects of interest
Sample
A representative subset of the population that can be studied
Types of Sample
Random (every 10th person)
Convenient (this cluster all together)
Parameter
Rule (applicable to population)
Statistic
Value (measured from sample)
Variable
A characteristic or condition of an observation that can take on different values
Dependent Variable
Outcome, variable of interest
Independent Variable
Exposure, predictor variables
Types of Categorical (Qualitative) Data
Nominal
Ordinal
Nominal Data
Values fall into categories of classes that are mutually exclusive and are not ordered
Dichotomous/Binary - Only Two Possible Categories (Dead/Alive)
Multiple Categories - (Race, Blood Type)
Ordinal Data
Values fall into categories or classes where order matters (Disease stage, satisfaction level)
2 Types of Numerical (Quantitative) Data
Discrete
Continuous
Discrete Data
Data has a numerical value that takes only certain whole number values (# of kids in a family)
Continuous
Data has a numerical value that can have any value in a continuum (height, weight, time)
Frequency Distributions - Data Representations
Categorical Data - Pie Charts, Bar Charts
Continuous Data - Histogram
Continuous Data - Box Plot
Unimodal Frequency Distribution
One Peak
Bimodal Frequency Distribution
Two Peaks
Right Skew Frequency Distribution
Tail to the right (more low values than high values)
Left Skew Frequency Distribution
Tail to the left (more high values than low values)
Central Tendency Descriptors
Mean
Median
Mode
Mean
Average
Pro - Uses all Data Values
Con - Distorted by outliers and skewed data
Median
Middle value of the ordered data set
Pro - Not distorted by outliers or skewed data
Con - Ignores most of the information
Mode
Most frequently occurring value
Pro - Easily determined for categorical data
Con - Ignores most of the information
Spread
Measures to describe the variability of dispersion Range IQR Variance Standard Deviation
Range
Difference between largest and smallest values
Pro - Easily Determined
Con - Distorted by Outliers
IQR (Inter-Quartile Range)
Difference between the 25th and 75th percentiles
Pro - Unaffected by outliers
Con - Appropriate for skewed data
Variance
Each deviation is squared
Standard Deviation
Square root of variance, an average of deviations from the observations from the mean
Inferential Statistics
The process of drawing conclusions about a population from a sample.
Starts with a Null Hypothesis and an Alternative Hypothesis
Null Hypothesis (H0)
Assumes no effect in the population
Alternative Hypothesis (H1)
Assumes effect in the population
Steps for Hypothesis Testing in Inferential Statistics
Assume Null Hypothesis to be true
Collect data from the sample to disprove Null Hypothesis
Either reject H0 (if there is convincing/strong evidence against it) or fail to reject H0
Type 1 Error
Reject the null when the null is actually true
Probability - α
Type 2 Error
Fail to reject the null when the null is false
Probability - β
Power
The probability of rejecting H0 when it is false (Not committing a Type 2 error) = 1 - β
Aim for 100%, settle for 80 - 90%
Factors influencing power
Sample Size
Variability
Effect of Interest
Significance Level
How does Sample Size influence Power?
Power increases with larger samples
How does Variability influence Power?
Power increases as variability decreases
How does Effect of Interest influence Power?
Power increases with larger effect size
How does Significance Level influence Power?
Power increases with larger α
α
The chance of Type 1 Error we are willing to accept, decided prior to collecting data
Typically α = 0.05
Using a smaller α will increase your β
P-Value
The probability of obtaining our results or something more extreme given that the null hypothesis is true
P
Reject H0 and conclude that results are significant at the α% level
Confidence Interval
Estimated range of values likely to include the population parameter.
Point estimate, 95% CI (upper limit, lower limit)
What do P-Values tell you about?
Statistical Significance
What do Confidence Intervals tell you?
Statistical Significance + Information about Size and Direction of the effect.
Statistical Significance
90% Confidence Interval does not include the null
A very small difference that is not clinically meaningful can reach statistical significance if the sample size is large enough
Clinical Significance
Effect Estimate is above the threshold for clinical relevance