Module 2B Exploring Data Visually Flashcards

Question 1

Q

What is Exploratory Data Analysis (EDA)?

Answer

A

EDA is the process of understanding data through the heavy use of descriptive statistics and visualization.

Question 2

Q

What are the objectives of Exploratory Data Analysis?

Answer

A

The objectives include detecting errors, missing values, unusual values, characterizing the distribution of values for variables, and identifying patterns and relationships between variables.

Question 3

Q

What are the challenges associated with Exploratory Data Analysis?

Answer

A

Challenges include dealing with tall data (many rows) and wide data (many columns).

Question 4

Q

What are ordinal variables?

Answer

A

Ordinal variables are categorical variables that have a natural ordering.

Question 5

Q

How can the strength of a correlation between two variables be visually gauged?

Answer

A

The strength can be gauged by how closely data points cluster around the linear trendline.

Question 6

Q

What is a spurious relationship in data analysis?

Answer

A

A spurious relationship is when two variables appear to be related but are not, often due to a lurking variable or sample bias.

Question 7

Q

How can data be organized to facilitate exploratory analysis?

Answer

A

Techniques include creating and using Excel tables, applying filters, sorting values, and creating new variables for summary statistics.

Question 8

Q

What does univariate analysis focus on?

Answer

A

It focuses on the distribution of values within a single variable using methods like frequency tables, histograms, and box-and-whisker plots.

Question 9

Q

What is crosstabulation used for in data analysis?

Answer

A

Crosstabulation is used to compare two or more variables using PivotTables and PivotCharts.

Question 10

Q

What is legitimately missing data?

Answer

A

Missing data is deemed legitimate when they naturally occur. No remedial action is taken.

Question 11

Q

What features can be distinguished in time-series data using line charts?

Answer

A

Features include trends, variability, and seasonality.

Question 12

Q

What are the two general types of geographic visualizations?

Answer

A

Choropleth maps, which use colors or symbols to represent data, and cartograms, which represent areas non-proportionally to show data density or frequency.

Question 13

Q

What is considered illegitimately missing data?

Answer

A

When the missing data do not occur naturally, they are deemed illegitimate.

Question 14

Q

What are the options to address illegitimately missing data?

Answer

A

Discard observations, Estimate values, treat as a seperate category for a categorical variable.

Question 15

Q

What are the categories of illegitimately missing data?

Answer

A

Missing completely at random (MCAR), Missing at random (MAR), Missing not at random (MNAR)

Question 16

Q

What is the remedial action for the categories of illegitimately missing data?

Answer

A

MCAR = discard observations or replace them with mean, median or mode
MAR = estimate by using values in the observation
MNAR = consider removing the variable