Exploratory Data Analysis Flashcards

1
Q

Exploratory Data Analysis

A

Describes the preliminary investigation of a dataset using numerical and graphical summaries

‘Highlights’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Univariate

A

-Analyzing a variable by itself
-Mean, Variance, Quantiles, Frequency
–summary() - Numeric outputs min, max, quantiles; Factor outputs frequency by level
–table(d) - Factor outputs frequency by level

Histogram
-Show how a variable is distributed; useful for detecting skewness
-Divides a numeric variable into equal-length bins and plots bin frequency

Bar chart
-Visualize a factor; similar to histogram but no bins
-Plots frequency of each possible value

Box plot
-Depicts a numerical variables distribution; useful for detecting skewness
-1st Q, Median, 3rd Q, and outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Bivariate

A

-Analyzing two variables together
–Used to show how the target relates to a predictor
–Predictor is considered predictive when a change in its value suggests that the target should change as well
–Predictor that is strongly correlated with the target
–Factor where its different levels produce dissimilar box plots of the target

Numerical
-Correlation measures how linear two variables are with each other
-Univariate statistics by levels of a factor
-Tally frequency of each pair of possible values between to factors

Graphical
Side-by-side Histogram - plots by levels of a factor
Side-by-side Bar chart - plots by levels of a factor
Side-by-side Box plot - plots by levels of a factor
Scatterplot - plots two variables against each other; each point represents an observation; Dot plot makes visual more granular by plotting every observation where similar values form a line of dots

For two non-target variables:
–looking for correlation and other patterns that suggest they are related in some way. A potential cause could be collinearity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Faceting and Multivariate Analyses

A

-Faceting creates subgroups in the dataset so that a chosen (usually bivariate) plot can be constructed for each subgroup
-These graphs can incorporate more than two variables; used to detect the presence of an interaction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly