Week 3: Descriptive Analysis Of Numerical Data Flashcards
Descriptive analysis of numerical data
1) Frequency distributions
2) Study of the shape of data
3) Measures of central and non central tendency
4) Measures of variability
Frequency distributions are shown as
1) Tables 2) graphs (histograms or dotplots)
Histograms
To display continuous data related to the frequency
With small sample sizes we use
Dotplots
Frequency distributions present different shape
1) skewed to the left
2) symmetric
3) skewed to the right
If histograms show always decreasing or always increasing frequencies the distribution is said to be
J shaped
If frequencies are decreasing in the left side of the histogram and increasing in the right side of the histogram the distribution is said to be
U shaped
If frequencies are decreasing in the left side of the histogram and increasing in the right side of the histogram the distribution is said to be
U shaped
Sometimes there are data that do not fall near any other values. These extremely high or low values are called
Outliers
Sometimes there are data that do not fall near any other values. These extremely high or low values are called
Outliers
How do we calculate percentages
Dividing counts by the sample size and multiplying by 100
For which kind of variables do bar charts display frequency distribution?
For discrete variables
For which kind of variables do histograms display frequency distributions ?
For continuous variables
Most common measures of central tendency
Mean and median
What is the median
Middle value in a sorted list of data
When is median a better indicator of the central tendency of data ?
In skewed data or data with extreme values
When do we have a negative skewness?
When the median is bigger than the mean
When do we have symmetry?
When the mean is equal to the median
When do we have positive skewness?
Then the median is smaller than the mean
Measures of non central tendency for numerical data
Quantiles
First quartile
25% of the data are less than Q1 and 75% are greater than Q1
Second quartile
50% of the data are less than Q1 and 50% of the data are greater
Third quartile
75% of the data are less than Q3 and 25% are greater than Q3
Quantiles are
Values at specific positions in the sorted list of data