Week 6 - exploring data visually Flashcards

1
Q

What is exploratory data analysis (EDA)?

A

The process of understanding the data through the heavy use of descriptive statistics and visualisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the objectives of EDA?

A
  1. Detection of errors, missing values, and unusual observations.
  2. Characterisation of the distribution of values for individual variables.
  3. Identification of patterns and relationships between variables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is tall data?

A

Occurs when the number of records (rows) is large.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is wide data?

A

Occurs when the number of variables (columns) is large.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is correlation?

A

A statistical measure of the strength of the linear relationship between two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the range of correlation values?

A

Correlation is always between -1 and +1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the sign of the correlation indicate?

A

The direction of the relationship between the variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is the strength of correlation visually gauged?

A

By how close the data points cluster around the linear trendline.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a spurious relationship?

A

Occurs when there is no cause-and-effect relationship between two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What can cause a spurious relationship?

A
  1. Both variables being affected by a lurking variable.
  2. Sample data containing bias and not being representative of the population.
  3. Insufficient strength of the relationship to distinguish it from random coincidence.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does it mean when there is a missing association between variables?

A

The data points poorly fit the linear trendline and the correlation is near zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a table lens?

A

An alternative way to a scatter-chart matrix used to visualise relationships between different pairs of variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is illegitimate missing data?

A

When missing data does not occur naturally.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are options to address illegitimate missing data?

A
  1. Discard observations with missing values.
  2. Fill in missing entries with estimated values.
  3. Treat missing data as a separate category.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the types of missing values?

A
  1. MCAR - Missing completely at random. (Not related to any variable or variable values)
  2. MAR - Missing at random. (missing values for a variable related to another variable)
  3. MNAR - Missing not at random. (missing specific values for a variable related)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is time series data?

A

Observations collected over different points in time.

17
Q

What is cross-sectional data?

A

Observations collected at a single point in time.

18
Q

What features can line charts distinguish in time series data?

A
  1. Trend: the long-run pattern in values.
  2. Variability: the difference in values from period to period.
  3. Seasonality: a pattern that recurs periodically.
19
Q

What is temporal frequency?

A

The rate at which the data is displayed on the horizontal axis of a time series chart.

20
Q

What is a sparkline?

A

A minimalist type of line chart directly placed into a spreadsheet cell.

21
Q

What is geospatial data?

A

Includes information on the geographic location of each record.

22
Q

What are examples of geospatial data visualisations?

A

Choropleth maps and cartograms.

23
Q

What is a choropleth map?

A

A geographic visualisation that uses shades of colour to indicate the values of a variable associated with a geographic region.

24
Q

What are cartograms?

A

A map-like diagram that uses geographic positioning but does not necessarily correspond to land area.

25
Q

What is an equal area cartogram?

A

Provides a balanced visual representation of each state while maintaining fidelity to relative geographic positioning.