DATA Flashcards
Box and whisker plot
Graphical representation data showing the middle range of data (the “box”), reasonable ranges of variability (“whiskers”), and points (possible outliers) outside those ranges.
Collective outlier
A set of data points that is (uncommonly) different from others – for example, a missing heartbeat in an electrocardiogram; we don’t know exactly which millisecond it should’ve happened in, but collectively there’s a set of milliseconds that it’s missing from.
Contextual outlier
A data point that is (uncommonly) far from other data points related to it – for example, in Atlanta, a 90-degree (Fahrenheit) day in winter is an outlier, but a 90-degree day in summer is not.
Covariate
A characteristic or measurement that can be used to estimate the value of something – for example, a person’s height or the color of a car. A “feature” or “attribute”; in the standard tabular format, a column of data.
Eigenvalue
Amount by which an eigenvector gets rescaled in a linear transformation.
Eigenvector
Non-zero vector that does not change direction when a linear transformation is applied to it, but only gets rescaled by the eigenvalue
Principal component analysis
PCA
Transformation of data into orthogonal dimensions that are ranked by variance.
Point outlier
A data point that is (uncommonly) far from other data points – for example, an outdoor temperature reading of 200 degrees Fahrenheit.
Standardization
Transforming data by subtracting the mean and then dividing by standard deviation, so that it has mean 0 and variance 1.