Exploratory Data Analysis Week 4 Flashcards
What are the 6 main goals of exploratory data analysis?
- checking for outliers
- checking assumptions
- checking for data entry errors
- patterns not otherwise obvious
- to gain a thorough descriptive analysis of the data
- analysing and dealing with missing data
In a perfect distribution, mean and median would….
be the same
What would a positive distribution look like?
Tail pointing towards the higher numbers
What would a negative distribution look like?
Tail pointing to the negative numbers
What kind if information comes from the explore command?
- central tendency
- variability
- quantitative measures of shape
- confidence intervals
- percentiles
- stem and leaf
- box and whisker
- histograms
- normality
- homogeneity of variance
- skewness and kurtosis
If the distribution is positively skewed, will the mode be higher or lower than the mean?
Lower, because the mode will be towards the negative end (tail is pointing towards positive end)
If the distribution is negatively skewed, will the mode be higher or lower than the mean?
Higher, because the tail end points towards negative numbers and the highest is around the positive end
When might mode be a better estimate of central tendency than the mean?
In cases of extreme skewnesss
What are three underlying concepts of hypothesis testing?
- finding the stat sig diff
- reject or fail to reject hypothesis
- generalising sample result to population
What is a sample?
A small section of the population
What are two options for dealing with missing data or data entry errors?
- remove the data
- make educated guess about what was intended
- frequencies for categorical/nominal variables
- outliers for continuous/scale variables
What are the two command options for dealing with data entry errors (both continuous/scale and categorical/nominal) ?
- frequencies (categorical/nominal)
- outliers (continuous/scale)
What is normality?
The assumption that your data comes from a population that is normally distributed
What is homogeneity of variance?
The assumption that if your data was to be divided int groups, the level of variability in the groups would be approx. equal.
What is a leptokurtic distribution?
The really tall, skinny one
What is a platykurtic distribution?
The really flat one
If a distribution is negatively skewed where will the mode be?
Higher than the mean
If a distribution is generally extremely skewed, which measure of central tendency is best?
The mode
If a distribution is positively skewed, where will the mode be?
Lower than the mean
How does the presence of outliers affect the median and mean?
Mean goes up, but the median only shifts a little
Who developed most of the exploratory data?
John Tukey
What are percentiles good for?
When you want to see how an individual score relates to a group.