L2: Exploratory Data Analysis Flashcards

After this deck you should be able to: - differentiate discrete and continuous variables - understand the basic statistics used to characterise distributions - produce proper exploratory data analysis on different types of data via different R packages

1
Q

What are the differences between quantitative and qualitative variables?

A

Quantitative: generally numeric variables, such as numbers and can be continuous or discrete.

Qualitative: values that are descriptive such as categories or types. Can be ordinal (logically ordered) or nominal (categorical values with no order)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is variance calculated?

A

Variance is the measure of the spread. How much the data deviates from the mean.

Var = 1/(n-1) * SUMn(xi-mean)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the IQR?

A

Inter Quartile Range - the distance between the 3rd and 1st Quartile.

It represents the middle 50% of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some single variable visualisations?

A

Numerical:
Histogram
Boxplot

Categorical:
Pie chart
Bar plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some two variable visualisations?

A

Numerical:
Scatter plot
Line plot

Categorical:
Segmented bar plot
Mosaic plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When should histograms be used?

A

When we want to observe the skew of the data distribution.

We can see if the data is left or rightly skewed, if it is unimodal, bimodal or multimodal or uniform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A distribution has the order from smallest to largest: Mode, Median, Mean

What is the shape of its skewness?

A

This is a positively skewed distribution. Also known as a left-skewed distribution. (Long left tail)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A distribution has an order of Mean, Median, Mode in increasing value.

What kind of skew does the distribution have?

A

This would mean that the distribution is negatively skewed and so is a right skewed distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Take the time to draw out the components of a boxplot

A
Upper Hinge
Upper Quartile
IQR
Median
Lower Quartile
Lower Hinge

Outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Where does 50% of the data fall, in terms of the quartiles?

A

Between Q1 and Q3, we have 50% of the data by definition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A widely used plotting style:

  • Has two numerical variables
  • Ability to reveal linear/non-linear relationships
  • Shows correlation between variables
  • Shows presence of extreme outliers

What kind of plot is it?

A

Scatterplot (shows all individual data points)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How would you calculate the covariance of X and Y distributions?

A

cov(X,Y) = E[XY] - E[X]E[Y]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

If two variables have a correlation that is close to 0, what might we assume?

A

The two variables have little to no relationship. They are weakly related.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

If a QQ Plot has data points that do not follow the 1:1 axis of the Normal QQ Plot, then what does this indicate?

A

The data points do not fit a normal distribution.

It may be beneficial to plot the data points against a different shape of distribution (e.g. Uniform distribution)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly