histograms and quartiles Flashcards
quantitative variables
numerical values
categorical variables
names that aren’t necessarily logically ordered
- can be coded; female = 1
ordinal variables
categorical variables that have logical ordering such as N, P, C, D, HD
- can’t compute or average
distributions
two sets of numbers: values variables take and frequency
histogram displayed frequency distribution (no gaps)
features of a histogram
location: where is the centre - mean, median, mode
spread: what is the variability - deviation, gaps, outliers
shape; what is the distribution? - unimodal (one peak), multimodal, symmetric, skewed
median
the midpoint (50th percentile or Q2)
first quartile
25% below, 75% above. 25th percentile
third quartile
Q3 has 75% below
quantiles
a value that is greater than a given proportion of data. 0.1 is above 10%
cumulative distributions
counts are added up
should add to one
what proportion of the data is less than 10?
a rule for quartiles
- calculate 0.25n where n is the number of values
- if it is an integer, count off that many values in the ordered list, Q1 is halfway between that value and the next
- if its not an integer, round up and do the same
- for Q3, do the same but from the top
- can be done for any quantile
measuring spread
interquartile range: IQR = Q3-Q1 and gives the middle 50% of the data
range: max-min (not as useful as IQR)
randomising variables in excel
type out values/names in column B
- =RAND(), return in column A
- drag =RAND down to fill column A
- select column A and B and sort ascending