CHAPTER 10: Exploratory data analysis Flashcards
What is the subfield of applied statistics that investigates collected or transformed data to reveal patterns, peculiarities, and relationships?
Exploratory Data Analysis (EDA)
What is EDA often used for as a preliminary step in data analysis?
To determine if the planned method for analysis is appropriate for the collected data.
What are the four major themes that describe the methods used in EDA?
Revelation, resistance, reexpression, and residuals.
What features of the dataset does EDA often reveal through graphical displays?
Distribution, center, quantiles, spread, symmetry, and kurtosis.
What statistical measure is said to be resistant?
A measure not adversely affected by replacing some values in a dataset or by minor changes in all values.
Name two statistics that are not resistant and are seldom used in EDA.
Mean and variance.
What is the stem-and-leaf display (SALD)?
A histogram-like display of data where digits replace bars to represent frequencies.
How can a stem-and-leaf display be split when there are too many leaves?
Each stem can be divided into two groups (0–4 and 5–9) or five groups (0–1, 2–3, 4–5, 6–7, 8–9).
What is the depth of a data value?
The smaller rank of its position from each end of the array.
A statistic defined by its depth and tagged with a letter.
letter value.
Two data values in the array with depths calculated based on the median’s depth.
fourths or hinges
What is the tag used for the fourths?
F
A collection of letter values: the median, the fourths, and the extremes.
five-number summary.
What features of the data are displayed in a boxplot?
Location, spread, symmetry, extremes, and outliers.
What do the sides of the boxplot rectangle indicate?
The middle 50% of observations, plotted at the fourths or quartiles.