Exploratory Data Analysis Week 4 Flashcards

1
Q

What are the 6 main goals of exploratory data analysis?

A
  • checking for outliers
  • checking assumptions
  • checking for data entry errors
  • patterns not otherwise obvious
  • to gain a thorough descriptive analysis of the data
  • analysing and dealing with missing data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In a perfect distribution, mean and median would….

A

be the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What would a positive distribution look like?

A

Tail pointing towards the higher numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What would a negative distribution look like?

A

Tail pointing to the negative numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What kind if information comes from the explore command?

A
  • central tendency
  • variability
  • quantitative measures of shape
  • confidence intervals
  • percentiles
  • stem and leaf
  • box and whisker
  • histograms
  • normality
  • homogeneity of variance
  • skewness and kurtosis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If the distribution is positively skewed, will the mode be higher or lower than the mean?

A

Lower, because the mode will be towards the negative end (tail is pointing towards positive end)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If the distribution is negatively skewed, will the mode be higher or lower than the mean?

A

Higher, because the tail end points towards negative numbers and the highest is around the positive end

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When might mode be a better estimate of central tendency than the mean?

A

In cases of extreme skewnesss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are three underlying concepts of hypothesis testing?

A
  • finding the stat sig diff
  • reject or fail to reject hypothesis
  • generalising sample result to population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a sample?

A

A small section of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are two options for dealing with missing data or data entry errors?

A
  • remove the data
  • make educated guess about what was intended
  • frequencies for categorical/nominal variables
  • outliers for continuous/scale variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two command options for dealing with data entry errors (both continuous/scale and categorical/nominal) ?

A
  • frequencies (categorical/nominal)

- outliers (continuous/scale)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is normality?

A

The assumption that your data comes from a population that is normally distributed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is homogeneity of variance?

A

The assumption that if your data was to be divided int groups, the level of variability in the groups would be approx. equal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a leptokurtic distribution?

A

The really tall, skinny one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a platykurtic distribution?

A

The really flat one

17
Q

If a distribution is negatively skewed where will the mode be?

A

Higher than the mean

18
Q

If a distribution is generally extremely skewed, which measure of central tendency is best?

A

The mode

19
Q

If a distribution is positively skewed, where will the mode be?

A

Lower than the mean

20
Q

How does the presence of outliers affect the median and mean?

A

Mean goes up, but the median only shifts a little

21
Q

Who developed most of the exploratory data?

A

John Tukey

22
Q

What are percentiles good for?

A

When you want to see how an individual score relates to a group.