drug lit exam 1 - bio stats Flashcards
variables:
Determine if a variable is nominal, ordinal, interval, or ratio
Recognize dichotomous endpoints
Variables
Variable:
- Anything that can be observed or measured in a clinical experiment
Dependent Variable:
- The outcome of interest
- What should change as a result of the researcher’s intervention
Independent Variable
- The researcher’s intervention
- What is being manipulated
Types of Variables
Discrete Data
- Can only be whole numbers
- Example: you can’t have 2.13 children
Continuous Data
- Can take any value, within a defined range
- Example: you can divide BP mmHg into tenths of a mmHg, hundredths, even thousandths!
Another way to think about variable types…
Nominal
- Different categories, in no particular order
Ordinal
- Ordered categories, where the distance between categories cannot be considered equal
Interval
- Equal distances between values, but the zero point is arbitrary (not the same for each variable)
Ratio
- Equal distances between values, with a meaningful zero point
Variable Examples
Nominal
- No category is “higher” or “better” than others
- Every study participant in a sample will be placed into one of the categories
Also referred to as dichotomous when there are 2 options
Examples
- Medical diagnoses (“Diabetes”; “No diabetes”)
- Race or Nationality (“Asian”; “African”; “European”)
- Age groups (“< 18 years”; “18-44 years”; “> 44 years”)
Variable Examples
Ordinal
- There is ordering of these values, but the distance between values is not equal
Examples:
- Excellent/Satisfactory/Unsatisfactory
- Likert Scales (strongly agree, agree, neutral, disagree, strongly disagree)
- Cancer Stages I – IV
- The order of finishing a race
Variable Examples
Interval
- Ordering of values and equal distance between values
- The zero point isn’t meaningful, and therefore can be changed
Example
- Temperature
Variable Examples
Ratio
- Ordering of values and equal distance between values
- The zero is meaningful
Examples
- Weight (kg/lbs)
- Height (cm/inches)
- Blood pressure (mm Hg)
Variable Type & Assumptions
Nominal
Named categories
Ordinal
Same as nominal plus ordered categories
Interval
Same as ordinal plus equal intervals
Ratio
Same as interval plus meaningful zero
Learning objective for descriptive stats
Given a mean and standard deviation of a normally distributed sample, calculate the range that 95% of the data points fall between
Two categories of statistics
1) Descriptive Statistics
- Used for presenting, organizing, summarizing data
- Can summarize your data set with just a few key numbers
What you need to know about:
Mean
Median
Mode
Interquartile range
Standard deviation
Two categories of statistics
2) Inferential Statistics
Used to generalize data from a sample to a larger population
Used to identify “statistically significant” differences
Examples
Student’s t-test
Chi Squared
ANOVA
Understanding when and how to use these statistics won’t be a focus for the biostatistics lectures in this course
Measures of Central Tendency
Mean:
The average value
Sum all values and divide by number of values (N)
AKA the “typical value”
Only okay to use to describe interval and ratio data!
Affected by outliers
If a study reports the mean for ordinal data, critique that as bad statistics!
median
The middle value
The 50th percentile
Arrange all values from smallest to largest and pick the middle number
Used to describe ordinal data (interval and ratio are okay too)
mode
The most frequently occurring value or category
Used to describe nominal data (interval, ordinal, and ratio are ok too)
Quick Quiz! Calculate the mean, median, and mode for this dataset
7, 4, 2, 4, 8
Mean = 7+4+2+4+8 = 25/5 (n) = 5
Median = middle value: 2, 4, 4, 7, 8 = 4
Mode = 4 (most frequently occurring)
Which data set has the largest mean?
none of the above
Measures of Dispersion
How closely the data cluster around the measure of central tendency
Range:
The difference between the highest and lowest value
Measures the variability of the data
Advantage: simple to calculate and understand
Disadvantage: affected by outliers
Interquartile Range
The interval between the 25th and 75th percentiles
The middle 50% of values
A measure of variability
Directly related to the median
Advantage: Not affected by outliers
Standard Deviation (SD)
Very common estimate of data variability
Estimates the scatter of data points about the sample mean
Often necessary when running inferential statistics
68% in 1 SD
95% in 2 SD
99% in 3 SD