Biostats Modules 1 & 2 Flashcards
Categorical Data
- aka count, discrete or attribute data
- counted and not measured on a scale
- whole numbers
- two types: nominal level and ordinal level
Continuous Data
- aka variable or measurement data
- different values on a continuous scale
- as many deciimal places as a measurement can read
- two types: interval and ratio
Nominal level data
- qualitative data (not numerical)
- variable that can be counted
- i.e. gender, race, tumor type, occupation, smokers v nonsmokers
Rules of Nominal level data
- order does not matter
- i.e. it doesnt make sense to say children >adults or M>F
- can only have a set number of discrete possible values
- numbers do not mean anything
Ordinal level data
- objects represent the rank order (1st, 2nd, 3rd, etc)
- cannot measure the distance between the two
- order matters
- example: mild, moderate, severe pain or pressure ulcer staging
Interval level data
- all of the features of ordinal measurements except its all numerical
- equal differences between measurements
- no natural zero
- examples: year date in calenders and temperature in C and F
- order matters
Ratio level data
- highest level of measurement
- order matters
- differences are measureable
- example: height, weight, and length
- addition, subtraction, multiplication and division friendly
Ratio level data rules
- distance between the intervals on the scales are numerically equal
- the variables have an absolute zero
statistical analysis: interval/ratio level
- mean
- standard deviation
- range
statistical analysis: nominal level
- mode
- frequency
- percentage
statistical analysis: ordinal level
- median
- range
- often frequencies and percentage
Power
capacity of the study to detect differences or relationships that actually exist in the population
power analysis
determining the sample size needed to obtain sufficient power for a study
4 elements of power analysis
- significance level or alpha
- effect size
- power
- sample size
Normal Distrubution
- statistical term that does not imply that the results are “normal”
- refers to a particular shape–a bell-shaped curve
Normal distribution assumptions
- sample mean equals population mean
- sample SD equals the population SD
- infinite # of value
Variability
attempt to describe or quantify the spread or range of data
Range
simplest measure of variability, considers difference between largest and smallest values (subtract smallest from largest)
Central Tendency
- middle of a distribution
- indicate locality or centrality of data
- mean, median, mode
Mode
- most often
- only acceptable measure of central tendency for analyzing nominal data (non numeric)
Median
- middle number when lined up from greatest to least
- most precise central tendency for ordinal data and non normal distributed/skewed interval or ratio data
Mean
- the average
- sum of values in a sample divided by total # of values
- most accurate central tendency for interval/ratio data
Variance
- amount of dispersion or spread that exists among the values of a data set w respect to the mean
- large variance = more disparate scores
Standard Deviation
- most common measure used to establish how data are distributed
- measure of dispersion/spread of data around the mean
- can show by how much the data deviates from the mean
- Square root of variance
- flat/spread out curve= large standard deviation
- steep rise and fall curve=small standard deviation (all scores similar)
Standard Deviation Curve
- in a normal curve: 68% will be within one SD above or below the mean
- 34% above the mean and 34% below
Skewness
- measure of symmetry of distributions
- (mean-median)/SD
Skewness (+/-)
-positive: mode
Kurtosis
- measure of the shape of the curve
- normal, flat, peaked
Leptokurtic
- peaks sharply with fat tails
- less variability K>0
Mesokurtic
- normal distribution
- K=0
Platykurtic
- flattened
- highly dispersed
- K<0
Nonparametric methods
- used to summarize categorical data (nominal and ordinal)
- used in place of commonly used parametric methods for continuous level data
Advantages of Non parametrics
- not susceptible to outliers (like parametrics)
- ranked
- easier to calculate
- not as powerful
Chi-Square
- Chi-Square (X2) is used to examine differences among groups with variables measured at the nominal level
- X2 compares frequencies observed with frequencies expected
- easier to calculate/understand
- parametric statistic
Fisher’s exact test
- used to examine differences among groups with variable measure at the nominal level
- more accurate and more useful with small samples