1 - Intro to Stats Flashcards
What are the 3 common scales of measurement for variables in medicine?
- Nominal
- Ordinal
- Numeric (interval or ratio)
Describe nominal measurement
- Simplest b/c data fits in categories in no particular order
- No actual measurement
- Often dichotomous or binary (yes/no, male/female)
- Can be multiple categories
- Generally described in percentages or proportions
Describe ordinal measurement
- Inherent order to the categories
- Summary statistic (median)
- Often used in assessment of pt risk
- Difference between 2 adjacent categories isn’t the same throughout the scale
Describe numerical measurement
- Differences have meaning on numerical scale
- 2 types of numerical scales - interval and ratio
Describe the difference between interval and ratio
- Interval - difference between any pair of levels is the same (ex: temperature 10-15 = 20-25), but no meaningful zero value
- Ratio - interval scale w/ meaningful zero value (ex: time)
What are the 3 types of variables? Give examples of each
- Continuous (age, time)
- Discrete (number of houses on a street)
- Summary statistics (mean and SD)
What are the 3 measures of middle?
Mean, median, and mode
Describe mean
- Arithmetic average
- Mean = sum of x/n (x = individual observation; n = number of observations)
- Used w/ numerical variables; shouldn’t be used w/ ordinal variables (but often is)
Describe median
- Middle observation
- Arrange the observations from smallest to largest; count and find the middle
- For odd number of observations - median is the middle observation
- For even number - median is average of the values on either side of the middle
Describe mode
- Value that occurs most frequently
- Data can have more than 1 mode (bimodal distribution)
Which measure of middle should be used in skewed distributions?
- Use mean in symmetric (normal) distributions
- Use median for ordinal data or numerical data that is skewed (mean very sensitive to extreme values in small datasets)
What are the 4 measures of spread (dispersion)?
- Range
- Standard deviation/ variance
- Percentiles
- Interquartile range
Describe range
- Difference between the smallest and largest observation
- Minimum and maximum may also be given
Describe the standard deviation formula
- s = square root of [(sum of x - /x) ^ 2 / (n - 1)]
- x = value
- /x = mean
- n = sample size
What is variance?
Sum of x - /x before square root is taken
Describe percentiles
- Percentage of a distribution that is equal to or below a particular number
- Median = 50th percentile
- Common example = physical growth charts for children
What is interquartile range?
- Difference between the 25th and 75th percentiles (1st and 3rd quartiles)
- Describes the middle 50% of the distribution regardless of the shape
When should standard deviation be used?
W/ mean w/ symmetric data
When should percentiles and interquartile range be used?
W/ median for ordinal data or skewed numerical data
Describe tabular presentations
- Nominal and ordinal data presented as proportions or percentages
- Summarized in frequency tables
What is the purpose of contingency tables?
To facilitate simultaneous examination of multiple distributions (variables)
What are 4 types of numerical data?
- Stem-and-leaf plots
- Five number summary
- Boxplots
- Grouped frequency tables
Describe a boxplot
- Upper and lower hinges of the box are made w/ 1st and 3rd quartile
- Median is the line in the box
How is symmetry of a boxplot evaluated?
- Evaluated by symmetry of the hinges w/ respect to median
- If hinges equidistant from median = data is symmetrical
- If upper hinge further away from median = data positively skewed
- If lower hinge further away from median = data negatively skewed
Describe a box and whisker plot
Same as boxplot but whiskers drawn from upper and lower hinges to largest/smallest non-outlying values
Describe a modified boxplot
- Outliers are identified by an asterisk
- Boundary for outliers = 1.5x interquartile range from the box
Describe how to construct a grouped frequency table
- Group observations on variable into contiguous, non-overlapping (preferably equal) class intervals (bins)
- Place each observation into only one bin
- Tabulate frequency of observations in each bin
- Can calculate relative frequency proportion or percentage
- Can also tabulate cumulative frequency and cumulative relative frequencies
- How many bins (k) and how wide (w)
What are some “rules” for grouped frequency tables?
- Poor grouping = loss of information (may emphasize or hide elements of the variable)
- Too few bins = loss of info
- Too many = cumbersome, data gaps
- General rule = 5-20 class intervals
Describe guidance for grouped frequency tables
- w = R/k
- w = width of the bin, k = # of bins, R = range
What is the difference between histograms and bar charts?
Histogram bars are generally joined b/c they represent a continuous distribution
What is a frequency polygon?
- Created by linking the mid-points of successive bins
- Polygon finished by joining to x-axis at point corresponding to the mid-point of the extreme zero-frequency bins
How can you use a histogram to find the mean?
Mean = [sum of (f * xmid)] / sum of f
- f = frequency
- x mid = midpoint of x value range