Introduction to Biostatistics Flashcards
What is biostatistics?
Statistical tools we apply to research-based problems
What are three types of descriptive statistics?
- Measures of central tendency
- Measures of spread
- Relative Position
What are the three measures of central tendency?
How do we determine which to use?
- Mean
- Median
- Mode
- Geometric Mean
The mean is ideal for roughly symmetric data because it gives a pretty precise measurement of the distribution of the data.
However, the mean is highly influenced by skewed data, in which case it is better to use the median as a central value.
Geometric mean is ideal for right skewed data
The mode does not tell us much of anything unless the data is categorical
What are the benefits of the geometric mean?
The geometric mean is good for right skewed data, but not left. It is more commonly employed in public health.
It has the positive properties of the x-bar mean while taming extreme values. However, it does not define values less than zero so it is not a good source for left skewed data
How do you get the geometric mean?
Take the log of xi and sum them and divide by n. then you take the reverse log of that number
What are the measures of spread?
When do we use them?
- Interquartile Rage (IQR)
- Range
- Standard Deviation
IQR is better for skewed data while standard deviation is better for more symmetrical data.
The range is pretty unreliable.
What is right skewed?
Also called positively skewed
most of the data values fall to the left of the mean
the TAIL IS TO THE RIGHT.
What is left skewed distribution?
also called negatively skewed
Most of the data values fall to the right of the mean
the TAIL IS TO THE LEFT
When is a distribution symmetrical?
When all the data values are evenly distributed on both sides of the mean.
Also if it is unimodal - the mean median and mode are all equal to one another and at the center of the distribution.
How do we calculate a percentile?
- Order the data set from smallest to largest
- Compute the position (c) of the percentile (k).
c= n(k)/100 - If c is not a whole number round up to the next whole number and this location is the required percentile.
If it is not a whole number find the number between c and c+1 in the ordered set. This number will be the percentile.
What is an outlier?
It is an extremely small or large data value in comparison to the remaining values.
How do you calculate if a value is an outlier?
Q1 - 1.5(IQR)
and
Q3 + 1.5(IQR)
To identify an extreme outlier