Measures of centrality, spread/dispersion, correlation Flashcards

Question 1

Q

What are the measurement levels?

Answer

A

Nominal/ ordinal (categorical)

* Interval/ ratio (continuous and numerical)

Question 2

Q

What is a measure of centre?

Answer

A

Is the point around which most of the data is concentrated.

Question 3

Q

What are the ways to measure centrality?

Answer

A

Mode
–> nominal

Mean
–> interval/ratio (continuous) –> no outliers

Median

-> interval/ratio (continuous) –> with outliers
-> ordinal

Question 4

Q

How do you find the mode?

Answer

A

Count how many times each value appears in the data set and choose the one that occurs the most.

Strawberry: 15 = the mode
Chocolate: 10
Pistachio: 3

Question 5

Q

How do you find the mean?

What is the formula?

Answer

A

• Add up all the values and divide them by the number of values.

X(bar) or u = ∑x/n

X(bar on top) or u = the mean
∑ = sum up 
x = a value
∑x = sum of all the values
n = the number of values

Conclusion:
Mean = sum of values/ number of values

Question 6

Q

How do you find the median?

FACT: the median is the middle value

Answer

A

Unequal numbers of values (9)

order all values from low to high
count the number of values
split the number of values in 2 and round it upwards
count from the beginning to the value from step.3 to find the middle value

Equal numbers of values (8)
1. The median is the mean (average) of the middle two values.

Question 7

Q

FACT: when equal numbers of values –>

The median is the mean of the middle two values

Question 8

Q

What is spread (dispersion)?

Answer

A

Spread designates how much data values differ from each other and from the measure of centre.

Short:
How much values differ:
• from each other
• from the measure of centre (mode, mean, median)

Question 9

Q

What are the 4 measures of spread?

Answer

A

Range (Inter quartile range)
Mean absolute deviation (MAD)
Variance
Standard deviation

Question 10

Q

What is range?

FACT: range is highly influenced by outliers

Answer

A

Range is the difference between the highest value and the lowest value.

Range = maximum - minumum

Question 11

Q

What are quartiles?

What is the Interquartile range?

FACT: IQR good to use when there are outliers.

Answer

A

Quartiles =
Are the numbers that split your data into four equal parts (25% of data per part)

If equal numbers:
Calculate the mean between two values

Interquartile range =
The middle 50% of the data.

Q3-Q1 = IQR (the middle 50%)

Question 12

Q

What is the Mean Absolute Deviation?

What are the disadvantages of the MAD?

Answer

A

It calculates the average absolute deviations
(deviation = difference between the mean and the value)

Disadvantages =

Mathematical difficult to optimise
Not enough emphasis on extreme values

Question 13

Q

What is variance?

What are the disadvantages of variance?

What do you use for the calculation?

Answer

A

The measure of spread that looks at squared deviation.
The variance is the mean squared deviation from the values to the mean.

–> When the variance is high, the spread is also high
(because in the formula you subtract the mean from the value)

Disadvantages =
• The unit of the variance is different from the unit of the variable (interpretation of unit)

Calculation:
For the variance we use the mean as the measure of centre, because the mean included every value of the data.

Mean
Value - mean
square the differences
add the squared differences –> SS (sum of squares)
SS/ number of values/observations –> variance

Question 14

Q

FACT: standard deviation is the STANDARD measure of spread.

Why?

Answer

A

It is mathematically nice to work with squared differences in optimisation
Squared differences give more emphasis to extreme values
Easy to interpret because the unit of the standard deviation is the same as the unit of the ordinal variable

Question 15

Q

How do you calculate the Standard Deviation?

Answer

A

All steps from Variance

Mean
Value - mean
square the differences
add the squared differences –> SS (sum of squares)
SS/ number of values/observations –> variance

+
Standard Deviation = the root of variance
Root of –> SS/n

Question 16

Q

What is correlation?

Answer

A

Correlation is a standardised measure of the strength of the linear relationship between two variables.

When a change in one variable LEADS to CHANGE in another variable.

-> doesn’t have an order
-> one does not cause the other

Question 17

Q

What is a causation?

Answer

A

Means that one variable CAUSES the other variable to change.

(one directly changes the other)

In order for there to be a causation a change in the one variable has directly change the other variable.

Question 18

Q

Correlation Coefficient = Pearsons R

Answer

A

[ -1 , 1 ]

Correlation is a standardised measure of the strength of the linear relationship between two variables.

The closer to zero, the weaker the correlation.

A high positive correlation means that when one variable increase, the other also increases.

A high negative correlation means that when one variable increases, the other one decreases.

A correlation of 0 means that when one variable increases, that has no linear influence on the other.

–> a correlation of 0 does not mean that there is no relationship between the 2 variables, it could be a non-linear relationship.