Descriptive Statistics Flashcards

1
Q

What are the 3 measures of central tendency?

A

-Mean
-Median
-Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 3 measures of disperison?

A

-Interquaritle Range (IQR)
-Variance
-Standard Deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the measures of association?

A

-Chhi-Squared
-Correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define Central tendency

A

A single number that aims to represent the ‘typical’ value of a variable (the average), somewhere between the highest and lowest value of the observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Cenral tendency useful for?

A

-Useful for comparisons between datasets or groups within a dataset
-Can be tracked over time to monitor increases/decreases in key metrics
-Used in many statistical tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Mean?

A

Calculated by summing all values of a variable and dividing by the number of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Features of the Mean

A

-For ordinal and scale data
-Statistically powerful (uses all data points)
-Not robust (can be distorted by outliers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define the Median (M)

A

The middle value when values of a variable are arranged in order of magnitude

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Features of the Median

A

-For ordinal and scale data
-Robust to outliers, so more appropriate than mean when dealing with extreme values
-Lacks statistical power

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define the Mode

A

The most commonly occuring value (may be more than one mode for a single variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Features of the Mode

A

-Only measures suitable for nominal data
-Can be used with ordinal and scale data but other options are generally prefrable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is the Mode useful?

A
  • Categorical data: The only measure of central tendency suitable for nominal variables
  • Visualisation and reporting: Grouping numerical data can simplify communication and involves trade-off between detail and user-friendliness
  • Aggregated or transformed data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define dispersion

A

Dispersion measures how far, on average, each observation lies from the central tendency. Represents the variation in values within a variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Interquartile Range

A

IQR is the range of values within the middle 50% of data points, calculated as the difference between Q1 and Q3, with Q1 located at position (n+1)/4 and Q3 at 3(n+1)/4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is comparing the range and IQR useful for?

A

-Useful for understanding the dispersion of a variable and identifying outliers
-Box plots are the easiest way to do this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define Variance

A

The mean of the squared differences between each value and the mean.

17
Q

Define Standard Deviation

A

The square root of the variance, representing how far, on average we can expect an individual observation to deviate above or below the mean

18
Q

How to calculate Standard Deviation

A
  1. Find the mean
  2. Find the difference between value and the mean
  3. Square each difference
  4. Find the sum of the squared differences
  5. Find the variance: the mean of the squared differecnes
  6. Find the SD: the square root of the variance
19
Q

What is Kurtosis?

A

The ‘flatness’ of the distribution of values

20
Q

What does a Large SD/ flat distribution mean?

A

Data are fairly dispersed around the mean, with more values in the tails of the distribution

21
Q

What does a Small SD/ narrow distributiin mean?

A

Has a ‘peak’ in values clustered around the mean.

22
Q

What do we use the coefficient of variation (CV) to do?

A

To measure the relative variability. This is typically expressed as a percentage.

23
Q

What is the Coefficient of Variation useful for comparing?

A

-Different variables
-The same variable
-International comparisons

24
Q

What do measures of association consider?

A

The relationship between two variables

25
Q

Chi-Squared

A

Tests for association based on the frequency of two variables’ co-occurence, comparing the expected frequency if there was no association with the observed frequency in the sample data.

26
Q

Correlation

A

Represents the strength and magnitude of the association between two variables. The correlation coefficient r ranges from -1 to 1.

27
Q

What is a Contingency table?

A

Contingency tables list the possible values of x and y, and the frequency of each combination. This is known as cross-tabulation.

28
Q

What does a Positive covariance indicate?

A

Indicates variables that tend to ‘move together’ away from their means: if we observe a high value of x, we also expect to see a high value of y

29
Q

What does a Negative Covariance indicate?

A

Indicates variables that move in opposite directions: if we oberve a high value of x, we expect to see a value of y below its mean.

30
Q

What is the correlation coefficient?

A

It transforms the covariance to a scaled, interpretable representation of the strength and direction of this relationship. It ranges from -1 to 1.

31
Q

What are Scatter Plots?

A

They show if there is a linear relationship between two variables.

32
Q

Skewness

A

Skewness refers to an imbalance, asymmetry, or distortion in the distribution of various organizational, economic, or behavioral factors.

33
Q

What is a skewed distribution?

A

Skewed distributions have a relatively higher proportion of their values at the low (positive skew) or high (negative skew) end of the range

34
Q

What is Normal distribution?

A

When the mean and median are approximately the same (PCS close to 0), the data is symetrically distributed around the central tendency.

35
Q

What are normal distributions always?

A
  1. Symetric
  2. Asymptotic