Descriptive Statistics Flashcards
What are the 3 measures of central tendency?
-Mean
-Median
-Mode
What are the 3 measures of disperison?
-Interquaritle Range (IQR)
-Variance
-Standard Deviation
What are the measures of association?
-Chhi-Squared
-Correlation
Define Central tendency
A single number that aims to represent the ‘typical’ value of a variable (the average), somewhere between the highest and lowest value of the observations.
What is Cenral tendency useful for?
-Useful for comparisons between datasets or groups within a dataset
-Can be tracked over time to monitor increases/decreases in key metrics
-Used in many statistical tests
What is the Mean?
Calculated by summing all values of a variable and dividing by the number of observations
Features of the Mean
-For ordinal and scale data
-Statistically powerful (uses all data points)
-Not robust (can be distorted by outliers)
Define the Median (M)
The middle value when values of a variable are arranged in order of magnitude
Features of the Median
-For ordinal and scale data
-Robust to outliers, so more appropriate than mean when dealing with extreme values
-Lacks statistical power
Define the Mode
The most commonly occuring value (may be more than one mode for a single variable)
Features of the Mode
-Only measures suitable for nominal data
-Can be used with ordinal and scale data but other options are generally prefrable
How is the Mode useful?
- Categorical data: The only measure of central tendency suitable for nominal variables
- Visualisation and reporting: Grouping numerical data can simplify communication and involves trade-off between detail and user-friendliness
- Aggregated or transformed data
Define dispersion
Dispersion measures how far, on average, each observation lies from the central tendency. Represents the variation in values within a variable.
Interquartile Range
IQR is the range of values within the middle 50% of data points, calculated as the difference between Q1 and Q3, with Q1 located at position (n+1)/4 and Q3 at 3(n+1)/4
What is comparing the range and IQR useful for?
-Useful for understanding the dispersion of a variable and identifying outliers
-Box plots are the easiest way to do this
Define Variance
The mean of the squared differences between each value and the mean.
Define Standard Deviation
The square root of the variance, representing how far, on average we can expect an individual observation to deviate above or below the mean
How to calculate Standard Deviation
- Find the mean
- Find the difference between value and the mean
- Square each difference
- Find the sum of the squared differences
- Find the variance: the mean of the squared differecnes
- Find the SD: the square root of the variance
What is Kurtosis?
The ‘flatness’ of the distribution of values
What does a Large SD/ flat distribution mean?
Data are fairly dispersed around the mean, with more values in the tails of the distribution
What does a Small SD/ narrow distributiin mean?
Has a ‘peak’ in values clustered around the mean.
What do we use the coefficient of variation (CV) to do?
To measure the relative variability. This is typically expressed as a percentage.
What is the Coefficient of Variation useful for comparing?
-Different variables
-The same variable
-International comparisons
What do measures of association consider?
The relationship between two variables
Chi-Squared
Tests for association based on the frequency of two variables’ co-occurence, comparing the expected frequency if there was no association with the observed frequency in the sample data.
Correlation
Represents the strength and magnitude of the association between two variables. The correlation coefficient r ranges from -1 to 1.
What is a Contingency table?
Contingency tables list the possible values of x and y, and the frequency of each combination. This is known as cross-tabulation.
What does a Positive covariance indicate?
Indicates variables that tend to ‘move together’ away from their means: if we observe a high value of x, we also expect to see a high value of y
What does a Negative Covariance indicate?
Indicates variables that move in opposite directions: if we oberve a high value of x, we expect to see a value of y below its mean.
What is the correlation coefficient?
It transforms the covariance to a scaled, interpretable representation of the strength and direction of this relationship. It ranges from -1 to 1.
What are Scatter Plots?
They show if there is a linear relationship between two variables.
Skewness
Skewness refers to an imbalance, asymmetry, or distortion in the distribution of various organizational, economic, or behavioral factors.
What is a skewed distribution?
Skewed distributions have a relatively higher proportion of their values at the low (positive skew) or high (negative skew) end of the range
What is Normal distribution?
When the mean and median are approximately the same (PCS close to 0), the data is symetrically distributed around the central tendency.
What are normal distributions always?
- Symetric
- Asymptotic