Missing values Flashcards
week 3
generate the summary statistic abs visualizations of dataset
skim()
function is useful for adding rows for the missing combinations of variables
complete()
visualize missing data
vis_miss()
There are three main types of missing values:
- Missing Completely at Random (MCAR). “The dog eats homework”
- Missing at Random (MAR). “The dog ate a particular student’s homework”
- Missing not at Random (MNAR). “The dog only eats bad homework”
command is useful to use alongside a
filter
to just get the complete rows. (missing values)
complete.cases()
Mean value imputation
Replace any missing values with the mean of the available data for numeric variables.
Replace missing values with the modal (i.e. most common) category (level).
Very simple to implement.
Very crude – can distort structure of dataset
function can do mean imputation
impute_mean()
Nearest neighbour imputation
We do imputation based on records that are similar to the one with missing data. Can measure similarity (or rather dissimilarity) by calculating a distance between records. Could use Euclidean (straight line) distance.
Or some other criteria.