Exploratory Data Analysis Week 4 Flashcards

Question 1

Q

What are the 6 main goals of exploratory data analysis?

Answer

A

checking for outliers
checking assumptions
checking for data entry errors
patterns not otherwise obvious
to gain a thorough descriptive analysis of the data
analysing and dealing with missing data

Question 2

Q

In a perfect distribution, mean and median would….

Answer

A

be the same

Question 3

Q

What would a positive distribution look like?

Answer

A

Tail pointing towards the higher numbers

Question 4

Q

What would a negative distribution look like?

Answer

A

Tail pointing to the negative numbers

Question 5

Q

What kind if information comes from the explore command?

Answer

A

central tendency
variability
quantitative measures of shape
confidence intervals
percentiles
stem and leaf
box and whisker
histograms
normality
homogeneity of variance
skewness and kurtosis

Question 6

Q

If the distribution is positively skewed, will the mode be higher or lower than the mean?

Answer

A

Lower, because the mode will be towards the negative end (tail is pointing towards positive end)

Question 7

Q

If the distribution is negatively skewed, will the mode be higher or lower than the mean?

Answer

A

Higher, because the tail end points towards negative numbers and the highest is around the positive end

Question 8

Q

When might mode be a better estimate of central tendency than the mean?

Answer

A

In cases of extreme skewnesss

Question 9

Q

What are three underlying concepts of hypothesis testing?

Answer

A

finding the stat sig diff
reject or fail to reject hypothesis
generalising sample result to population

Question 10

Q

What is a sample?

Answer

A

A small section of the population

Question 11

Q

What are two options for dealing with missing data or data entry errors?

Answer

A

remove the data
make educated guess about what was intended
frequencies for categorical/nominal variables
outliers for continuous/scale variables

Question 12

Q

What are the two command options for dealing with data entry errors (both continuous/scale and categorical/nominal) ?

Answer

A

frequencies (categorical/nominal)

- outliers (continuous/scale)

Question 13

Q

What is normality?

Answer

A

The assumption that your data comes from a population that is normally distributed

Question 14

Q

What is homogeneity of variance?

Answer

A

The assumption that if your data was to be divided int groups, the level of variability in the groups would be approx. equal.

Question 15

Q

What is a leptokurtic distribution?

Answer

A

The really tall, skinny one

Question 16

Q

What is a platykurtic distribution?

Answer

Study These Flashcards

A

The really flat one

Question 17

Q

If a distribution is negatively skewed where will the mode be?

Answer

Study These Flashcards

A

Higher than the mean

Question 18

Q

If a distribution is generally extremely skewed, which measure of central tendency is best?

Answer

Study These Flashcards

A

The mode

Question 19

Q

If a distribution is positively skewed, where will the mode be?

Answer

Study These Flashcards

A

Lower than the mean

Question 20

Q

How does the presence of outliers affect the median and mean?

Answer

Study These Flashcards

A

Mean goes up, but the median only shifts a little

Question 21

Q

Who developed most of the exploratory data?

Answer

Study These Flashcards

A

John Tukey

Question 22

Q

What are percentiles good for?

Answer

Study These Flashcards

A

When you want to see how an individual score relates to a group.

Exploratory Data Analysis Week 4 Flashcards

(22 cards)