exam 3 - chapter 8 Flashcards

1
Q

The Role of Data Understanding
•What it is NOT
[3]

A
The Role of Data Understanding
•What it is NOT
  •Mindless calculation of statistics
  •Creating pretty graphs
  •Reporting the obvious
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The Role of Data Understanding
•What it is
[3]

A

The Role of Data Understanding
•What it is
•Calculation and interpretations of statistics
•Creating graphs that help you understand your data set
•Reporting the anomalous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The EDA Process

A
  1. When the dataset is more or less ready for analysis, start applying the standard techniques to get a basic understanding of the features.
  2. You will begin to form a hypothesis about some aspects of the dataset (from the context of the problem).
  3. Apply EDA techniques to begin confirming/rejecting your hypothesis and preconceived ideas.
  4. You will start to understand the dataset. New questions will come to mind.
  5. Apply EDA techniques to try answering these new questions. You will gain more understanding, and further new questions will pop into your head.
  6. Repeat Step 4 and Step 5 a few times.
  7. Stop when you feel comfortable with the understanding you’ve got, and you think that you can move on to the modeling stage.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

•What can we learn from EDA using

[2]

A
  • What can we learn from EDA using
    • Numerical Calculations
    • Visualizations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Types of Analysis

[5]

A
Types of Analysis
•Univariate Numerical
•Univariate Categorical (Nominal)
•Bivariate Numerical
•Bivariate Categorical (Nominal)
•Combinations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Never trust summary statistics alone; always ____ your data.

A

Never trust summary statistics alone; always visualize your data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Distributions
- Symmetric
[2]

A

Distributions
- Symmetric
•Easiest to interpret.
•More likely to be a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Distributions
•Skewed
[3]

A
Distributions
•Skewed
  •Averages do not represent typical
  •Common with COUNT variables
  •Can be flattened
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Distributions
•Multimodal
[2]

A

Distributions
•Multimodal
•Several common values
•Clusters might emerge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Distributions
Symmetric
[1]

A

Distributions
Symmetric
•Little variance means limited insights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
Bivariate Feature Analysis
• 
•Can be
  • 
  • 
  •
A
Bivariate Feature Analysis
•Analyzing one feature in terms of another or against another.
•Can be
  •2 numerical features
  •2 categorical features
  •One of each (Combination)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Combination analysis - box and Whisker

[4]

A
Combination analysis - box and Whisker 
•Based on medians, not means
•Visualize the size of each quartile
•Range from top to bottom is called Interquartile Range
•Outliers (High and Low)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Don’t just calculate and draw

[2]

A

Don’t just calculate and draw
•Try to gain data understanding
•Think about how these data behaviors might impact models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Use the right tool for the right job

A

Use the right tool for the right job

•Different techniques depending on what we want to understand

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If an EDA activity is not helping to better understand your data set, why do it?

A

-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly