CHAPTER 10: Exploratory data analysis Flashcards

1
Q

What is the subfield of applied statistics that investigates collected or transformed data to reveal patterns, peculiarities, and relationships?

A

Exploratory Data Analysis (EDA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is EDA often used for as a preliminary step in data analysis?

A

To determine if the planned method for analysis is appropriate for the collected data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the four major themes that describe the methods used in EDA?

A

Revelation, resistance, reexpression, and residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What features of the dataset does EDA often reveal through graphical displays?

A

Distribution, center, quantiles, spread, symmetry, and kurtosis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What statistical measure is said to be resistant?

A

A measure not adversely affected by replacing some values in a dataset or by minor changes in all values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Name two statistics that are not resistant and are seldom used in EDA.

A

Mean and variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the stem-and-leaf display (SALD)?

A

A histogram-like display of data where digits replace bars to represent frequencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can a stem-and-leaf display be split when there are too many leaves?

A

Each stem can be divided into two groups (0–4 and 5–9) or five groups (0–1, 2–3, 4–5, 6–7, 8–9).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the depth of a data value?

A

The smaller rank of its position from each end of the array.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A statistic defined by its depth and tagged with a letter.

A

letter value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Two data values in the array with depths calculated based on the median’s depth.

A

fourths or hinges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the tag used for the fourths?

A

F

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A collection of letter values: the median, the fourths, and the extremes.

A

five-number summary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What features of the data are displayed in a boxplot?

A

Location, spread, symmetry, extremes, and outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do the sides of the boxplot rectangle indicate?

A

The middle 50% of observations, plotted at the fourths or quartiles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the line inside the boxplot rectangle represent?

A

The location of the median, a measure of central tendency.

15
Q

A graphical display of the five-number summary.

A

box-and-whisker plot (boxplot)