Chapter 3 Flashcards

1
Q

Definition of an Outlier

A

An outlier is a value that lies significantly outside the pattern of the data.

Standard formula to detect outliers:
Outlier < Q₁ - 1.5(Q₃ - Q₁)
or
Outlier > Q₃ + 1.5(Q₃ - Q₁)

Alternative method:
Outlier < 𝑥̄ - 2σ
or
Outlier > 𝑥̄ + 2σ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Removing Anomalies (Cleaning Data)

A

The process of removing errors or irrelevant data is called cleaning the data.
Steps in cleaning data:
Identify outliers using IQR or standard deviation.
Decide whether to keep or remove:
Keep if valid.
Remove if an error.
Justify your decision.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Box Plots

A

A box plot represents:

Minimum (non-outlier)
Lower Quartile (Q₁)
Median (Q₂)
Upper Quartile (Q₃)
Maximum (non-outlier)
Outliers plotted separately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Comparing Box Plots

A

Compare the medians (center of data).
Compare the IQRs (spread of middle 50%).
Check for skewness.
Look for outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cumulative Frequency & Quartiles

A

A cumulative frequency graph helps estimate:

Median at n/2
Lower Quartile (Q₁) at n/4
Upper Quartile (Q₃) at 3n/4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Interquartile Range (IQR)

A

IQR = Q₃ - Q₁

Measures spread of middle 50% of data.
Not affected by outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Percentile Range

A

10th to 90th percentile range:
P₉₀ - P₁₀

Better for comparing spread than just using range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Histograms

A

A histogram represents continuous data.

Formula for Frequency Density:
Frequency Density = Frequency ÷ Class Width

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Properties of Histograms

A

No gaps between bars.
Area of bar is proportional to frequency.
Used for continuous data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Estimating from a Histogram

A

Find total area under the histogram.
Estimate frequency of a subset using part of an area.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Comparing Histograms

A

When comparing histograms:

Use frequency density, not just bar height.
Compare spread.
Compare peak values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Skewness

A

Symmetric (mean ≈ median ≈ mode)
Positive skew (mean > median > mode)
Negative skew (mode > median > mean)
Formula for Skewness:
Skewness = 3 × (𝑥̄ - Median) ÷ σ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Comparing Distributions

A

Compare location (median, mean).
Compare spread (IQR, range).
Look for skewness.
Comment on outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Exam Tips for Data Representation

A

Use correct formulas.
Check for outliers.
Use interpolation for quartiles.
For histograms, use frequency density.
Use box plots for comparisons.
Cumulative frequency graphs estimate medians and quartiles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly