Descriptive Statistics Flashcards

Question 1

Q

What are the 3 measures of central tendency?

Answer

A

-Mean
-Median
-Mode

Question 2

Q

What are the 3 measures of disperison?

Answer

A

-Interquaritle Range (IQR)
-Variance
-Standard Deviation

Question 3

Q

What are the measures of association?

Answer

A

-Chhi-Squared
-Correlation

Question 4

Q

Define Central tendency

Answer

A

A single number that aims to represent the ‘typical’ value of a variable (the average), somewhere between the highest and lowest value of the observations.

Question 5

Q

What is Cenral tendency useful for?

Answer

A

-Useful for comparisons between datasets or groups within a dataset
-Can be tracked over time to monitor increases/decreases in key metrics
-Used in many statistical tests

Question 6

Q

What is the Mean?

Answer

A

Calculated by summing all values of a variable and dividing by the number of observations

Question 7

Q

Features of the Mean

Answer

A

-For ordinal and scale data
-Statistically powerful (uses all data points)
-Not robust (can be distorted by outliers)

Question 8

Q

Define the Median (M)

Answer

A

The middle value when values of a variable are arranged in order of magnitude

Question 9

Q

Features of the Median

Answer

A

-For ordinal and scale data
-Robust to outliers, so more appropriate than mean when dealing with extreme values
-Lacks statistical power

Question 10

Q

Define the Mode

Answer

A

The most commonly occuring value (may be more than one mode for a single variable)

Question 11

Q

Features of the Mode

Answer

A

-Only measures suitable for nominal data
-Can be used with ordinal and scale data but other options are generally prefrable

Question 12

Q

How is the Mode useful?

Answer

A

Categorical data: The only measure of central tendency suitable for nominal variables
Visualisation and reporting: Grouping numerical data can simplify communication and involves trade-off between detail and user-friendliness
Aggregated or transformed data

Question 13

Q

Define dispersion

Answer

A

Dispersion measures how far, on average, each observation lies from the central tendency. Represents the variation in values within a variable.

Question 14

Q

Interquartile Range

Answer

A

IQR is the range of values within the middle 50% of data points, calculated as the difference between Q1 and Q3, with Q1 located at position (n+1)/4 and Q3 at 3(n+1)/4

Question 15

Q

What is comparing the range and IQR useful for?

Answer

A

-Useful for understanding the dispersion of a variable and identifying outliers
-Box plots are the easiest way to do this

Question 16

Q

Define Variance

Answer

A

The mean of the squared differences between each value and the mean.

Question 17

Q

Define Standard Deviation

Answer

A

The square root of the variance, representing how far, on average we can expect an individual observation to deviate above or below the mean

Question 18

Q

How to calculate Standard Deviation

Answer

A

Find the mean
Find the difference between value and the mean
Square each difference
Find the sum of the squared differences
Find the variance: the mean of the squared differecnes
Find the SD: the square root of the variance

Question 19

Q

What is Kurtosis?

Answer

A

The ‘flatness’ of the distribution of values

Question 20

Q

What does a Large SD/ flat distribution mean?

Answer

A

Data are fairly dispersed around the mean, with more values in the tails of the distribution

Question 21

Q

What does a Small SD/ narrow distributiin mean?

Answer

A

Has a ‘peak’ in values clustered around the mean.

Question 22

Q

What do we use the coefficient of variation (CV) to do?

Answer

A

To measure the relative variability. This is typically expressed as a percentage.

Question 23

Q

What is the Coefficient of Variation useful for comparing?

Answer

A

-Different variables
-The same variable
-International comparisons

Question 24

Q

What do measures of association consider?

Answer

A

The relationship between two variables

Question 25

Q

Chi-Squared

Answer

A

Tests for association based on the frequency of two variables’ co-occurence, comparing the expected frequency if there was no association with the observed frequency in the sample data.

Question 26

Q

Correlation

Answer

A

Represents the strength and magnitude of the association between two variables. The correlation coefficient r ranges from -1 to 1.

Question 27

Q

What is a Contingency table?

Answer

A

Contingency tables list the possible values of x and y, and the frequency of each combination. This is known as cross-tabulation.

Question 28

Q

What does a Positive covariance indicate?

Answer

A

Indicates variables that tend to ‘move together’ away from their means: if we observe a high value of x, we also expect to see a high value of y

Question 29

Q

What does a Negative Covariance indicate?

Answer

A

Indicates variables that move in opposite directions: if we oberve a high value of x, we expect to see a value of y below its mean.

Question 30

Q

What is the correlation coefficient?

Answer

A

It transforms the covariance to a scaled, interpretable representation of the strength and direction of this relationship. It ranges from -1 to 1.

Question 31

Q

What are Scatter Plots?

Answer

A

They show if there is a linear relationship between two variables.

Question 32

Q

Skewness

Answer

A

Skewness refers to an imbalance, asymmetry, or distortion in the distribution of various organizational, economic, or behavioral factors.

Question 33

Q

What is a skewed distribution?

Answer

A

Skewed distributions have a relatively higher proportion of their values at the low (positive skew) or high (negative skew) end of the range

Question 34

Q

What is Normal distribution?

Answer

A

When the mean and median are approximately the same (PCS close to 0), the data is symetrically distributed around the central tendency.

Question 35

Q

What are normal distributions always?

Answer

A

Symetric
Asymptotic