Exploratory Data Analysis Flashcards
Exploratory Data Analysis
Describes the preliminary investigation of a dataset using numerical and graphical summaries
‘Highlights’
Univariate
-Analyzing a variable by itself
-Mean, Variance, Quantiles, Frequency
–summary() - Numeric outputs min, max, quantiles; Factor outputs frequency by level
–table(d) - Factor outputs frequency by level
Histogram
-Show how a variable is distributed; useful for detecting skewness
-Divides a numeric variable into equal-length bins and plots bin frequency
Bar chart
-Visualize a factor; similar to histogram but no bins
-Plots frequency of each possible value
Box plot
-Depicts a numerical variables distribution; useful for detecting skewness
-1st Q, Median, 3rd Q, and outliers
Bivariate
-Analyzing two variables together
–Used to show how the target relates to a predictor
–Predictor is considered predictive when a change in its value suggests that the target should change as well
–Predictor that is strongly correlated with the target
–Factor where its different levels produce dissimilar box plots of the target
Numerical
-Correlation measures how linear two variables are with each other
-Univariate statistics by levels of a factor
-Tally frequency of each pair of possible values between to factors
Graphical
Side-by-side Histogram - plots by levels of a factor
Side-by-side Bar chart - plots by levels of a factor
Side-by-side Box plot - plots by levels of a factor
Scatterplot - plots two variables against each other; each point represents an observation; Dot plot makes visual more granular by plotting every observation where similar values form a line of dots
For two non-target variables:
–looking for correlation and other patterns that suggest they are related in some way. A potential cause could be collinearity.
Faceting and Multivariate Analyses
-Faceting creates subgroups in the dataset so that a chosen (usually bivariate) plot can be constructed for each subgroup
-These graphs can incorporate more than two variables; used to detect the presence of an interaction