Lecture 2 - Intro to Stats Flashcards
What are the 3 common scales of measurement for variables in medicine?
- Nominal
- Ordinal
- Numerical
Describe Nominal data
- Simplest - data fits in categories (no actual order)
- Often dichotomous of binary (yes/no or male/female)
- Could be multiple categories like blood groups
- We can just describe it - no way to rank it
- Just use proportion or percentages
What are nominal data also called?
- Qualitative Observations
- Categorical Observations
Describe Ordinal data
- Inherent order to the categories (ex. Cancer staging 0-4)
- Summary statistic = median
- Difference between 2 adjacent categories is not the same throughout the scale
Describe Numerical data
- Difference have meaning on numerical scale
- Also called quantitative observations
What are the two types of numerical scales?
- Continuous scale - has a value on a continuum (ex. age)
- Discrete scale - values are integers (# of fractures, # of medications)
What summary statistics do you use for numerical data?
mean and SD
What type of data:
Nominal, ordinal, or continuous ?
Name
nominal
What type of data:
Nominal, ordinal, or continuous ?
Hair color
nominal
What type of data:
Nominal, ordinal, or continuous ?
Eye color
nominal
What type of data:
Nominal, ordinal, or continuous ?
Height
continuous
What type of data:
Nominal, ordinal, or continuous ?
Age
continuous
What type of data:
Nominal, ordinal, or continuous ?
Gender
nominal
What are the 3 “Measures of Middle”?
- Mean
- Median
- Mode
What is the mean?
- it’s the average yo
- used with numerical variables
What is the median?
The median is the middle observation
What is the mode?
The mode is the value that occurs most frequently
Can data have more than 1 mode ?
bimodal distribution
ex. some diseases have 2 peaks
If the data is not skewed, you can use ____ and ___.
mean and SD
If the data is skewed, you should use ____ and ___.
median and IQR
Negatively skewed is ____ skewed (outlying small values)
left
Positively skewed is ____ skewed (outlying values are large)
right
How do you know if something is right/positively skewed?
Mean > Median
How do you know if something is left/negatively skewed?
Mean < Median
Use mean if data is _____
symmetric
Use _____ for ordinal data or numerical data that is skewed
median
What are some measures of spread?
- Range
- Standard deviation/variance
- Coefficient of variation
- Percentiles
- Interquartile range
What is the range?
difference between smallest and largest values
How is variance related to standard deviation?
Variance is the statistic before the square root is taken
What is the coefficient of variation?
Measure of relative spread
CoV = SD/mean x 100
What is a percentile?
It is the percentage of a distribution that is equal to or below a particular number
(median = 50th percentile)
What is IQR?
interquartile range
IQR = Q3 - Q1
What do you use SD with?
mean (with symmetrical data)
What do you use percentiles and IQR with?
median for ordinal data or skewed numerical data
List 4 ways we can express numerical data
- Stem and leaf plots
- Five number summary
- Boxplots
- Grouped Frequency Tables
Why are stem and leaf plots useful?
- get some idea about the centrality
- helps to see if it’s skewed or not
What is a 5 number summary and why is a 5 number summary useful?
- Min
- Q1
- Median
- Q3
- Max
*Helps to show the location and spread of the data
What is the formula for finding percentile that he gave us?
p(n+1)
So say you’re trying to find the 25th percentile out of 16 numbers, you would do:
(0.25)(17) = 4.25
You would round down and choose the 4th number.
Describe a box and whisker plot
- Upper and lower hinges of box are the Q1 and Q3
- Median is inside the box
Describe how symmetry can be interpreted from a box and whisker plot ?
- Hinges equidistant from median means that the data is symmetrical
- If upper hinge is further away from the median, data are positively skewed
- If lower hinge is further away, data are negatively skewed
What do the whiskers represent?
the largest/smallest non-outlying values
What are outliers identified with in a box and whisker plot?
asterisk
What is the boundary for outliers?
(1.5)(IQR) + Q3
Describe grouped frequency tables
- Group observations on variable - into contiguous, non-overlapping (preferably equal) class intervals (bins)
- Place each observation into only one bin
- Tabulate frequency of observations in each bin
- Can calculate relative frequency - proportion or percentage
- Can also tabulate cumulate frequency and cumulative relative frequencies
Grouped frequency tables:
What does k represent?
how many bins
Grouped frequency tables:
What does w represent?
how wide
Grouped frequency tables:
What is the formula for determining the # of bins (k)?
K = the # of bins n = the sample size
k = 1 + 3.322 x log10(n)
Grouped frequency tables:
What is the formula for determining the width (w) ?
w = width of bind k = # of bins R = range
w = R/k
How is a frequency polygon created?
by linking the mid-points of successive bins
How do you work backwards on a frequency polygon to find the mean?
Mean = Sum (f*mid)/ Sum (f)
*not on formula sheet
How do you find the median from a frequency polygon?
Go to 50% and look over to see where it hits the line
How does sample size and population affect the probability distribution?
As sample size gets bigger and width decreases, the underlying distribution becomes clearer and you get a more smooth curve