Module 2B Exploring Data Visually Flashcards

1
Q

What is Exploratory Data Analysis (EDA)?

A

EDA is the process of understanding data through the heavy use of descriptive statistics and visualization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the objectives of Exploratory Data Analysis?

A

The objectives include detecting errors, missing values, unusual values, characterizing the distribution of values for variables, and identifying patterns and relationships between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the challenges associated with Exploratory Data Analysis?

A

Challenges include dealing with tall data (many rows) and wide data (many columns).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are ordinal variables?

A

Ordinal variables are categorical variables that have a natural ordering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can the strength of a correlation between two variables be visually gauged?

A

The strength can be gauged by how closely data points cluster around the linear trendline.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a spurious relationship in data analysis?

A

A spurious relationship is when two variables appear to be related but are not, often due to a lurking variable or sample bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can data be organized to facilitate exploratory analysis?

A

Techniques include creating and using Excel tables, applying filters, sorting values, and creating new variables for summary statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does univariate analysis focus on?

A

It focuses on the distribution of values within a single variable using methods like frequency tables, histograms, and box-and-whisker plots.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is crosstabulation used for in data analysis?

A

Crosstabulation is used to compare two or more variables using PivotTables and PivotCharts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is legitimately missing data?

A

Missing data is deemed legitimate when they naturally occur. No remedial action is taken.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What features can be distinguished in time-series data using line charts?

A

Features include trends, variability, and seasonality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two general types of geographic visualizations?

A

Choropleth maps, which use colors or symbols to represent data, and cartograms, which represent areas non-proportionally to show data density or frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is considered illegitimately missing data?

A

When the missing data do not occur naturally, they are deemed illegitimate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the options to address illegitimately missing data?

A

Discard observations, Estimate values, treat as a seperate category for a categorical variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the categories of illegitimately missing data?

A

Missing completely at random (MCAR), Missing at random (MAR), Missing not at random (MNAR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the remedial action for the categories of illegitimately missing data?

A

MCAR = discard observations or replace them with mean, median or mode
MAR = estimate by using values in the observation
MNAR = consider removing the variable