Intro to Biostatistics Flashcards
what are the two types of discrete/categorical variables?
nominal and ordinal
what is a nominal variable
discrete group
ex: male/female, smoker/non-smoker
what is an ordinal variable
ordered without meaningful intervals
ex: class rank
what are the two types of continuous variables?
interval and ratio
–> can take continuous data and turn it into categorical data
what is an interval variable
ordered with meaningful intervals, without an absolute zero - rarely used in medical research
ex: temperature in celsius
what is a ratio variable?
interval data with an absolute zero
ex: age, weight, cholesterol levels
define frequency distribution
a systematic arrangement of numerical data from the lowest to highest
what is a systematic arrangement of numerical data from the lowest to highest
frequency distribution
what are the three ways grouped data can be presented?
frequency - absolute number in each category
relative frequency - percent in each category
cumulative frequency - cumulative percent
what is a bar graph
grouped frequencies used to display nominal data
-ex: cholesterol by gender
how would you graph cholesterol by gender?
bar graph
how would you graph grouped frequencies used to display nominal data?
bar graph
what is a histogram
grouped frequencies generally used to display continuous variables
what is a frequency polygon
midpoints of each group joined by straight line used to display continuous variables
“line graph”
–> excellent for displaying distribution of sample
what type of graph would you used to show the distribution of a sample
frequency polygon
what is a cumulative frequency polygon
displays the cumulative frequency for continuous variables
“100% of sample has cholesterol level below 260… 15% of sample has cholesterol level below 190”
what type of graph would you use to approximate percentiles?
cumulative frequency polygon
what is a survival curve
plots death or endpoints
- Kaplan Meier method preferred
why are simple survival curves not useful in actual studies?
what is the solution to these problems
- patients are not enrolled at the same time
- patients drop out
- patients are followed for varying lengths of time
-Kaplan Meier method
what is the kaplan meier method and why does it solve the problem created by simple survival curves
- used to plot survival in medical research
- adjusts data to reflect the patients who are not followed for the entire study
- also referred to as censored survival data
what is a normal/gaussian distribution
frequency polygon with the appearance of a symmetrical bell-shaped curve; often generated by biologic/medical data
- many statistical manipulations and inferences are dependent on the assumption of this distribution
- more accurately approximated as sample size increases
positive and negative skewing refers to what part of the distribution.. the mean or the tail?
the tail
what is mean
the average value
- sensitive to extreme scores
- can misrepresent a population dramatically
what is median
middle value
what is mode
the value that occurs the most frequently
what is range
difference between the highest and lowest value
what is variance
quantifies the scatter present in the distribution of values
- average of the squared differences from the mean
- not a very intuitive measurement, won’t be asked to calculate
what is standard deviation (SD)
square root of the variance, much more intuitive
- most commonly used measure of variability
- the more spread out the sample distribution, the larger this SD
what is the most commonly used measure of variability
standard deviation
what percentage of the population falls within 1 standard deviation of the mean?
68%
what percentage of the population falls within 2 standard deviations of the mean?
95%
what percentage of the population falls within 3 standard deviations of the mean
99.7%
What are z scores?
- used to transpose standard deviations into percentile data
- identical to the standard deviations they represent
-used when considering the percent of a population above or below a specific level
what is the multiplication rule?
used to calculate the probability of 2 (or more) independent events both occurring
-p(a)xp(b) = p(a+b)
how do you calculate the probability of 2 or more independent events both occurring?
multiplication rule
what is the addition rule?
used to calculate the probability of either of 2 or more independent events occurring
-p(a or b) = P(a) + p(b) - P(a + b)
–> if mutually exclusive, P(a + b)=0
how do you calculate the probability of either of 2 or more independent events occurring?
addition rule