Measures of centrality, spread/dispersion, correlation Flashcards

1
Q

What are the measurement levels?

A
  • Nominal/ ordinal (categorical)

* Interval/ ratio (continuous and numerical)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a measure of centre?

A

Is the point around which most of the data is concentrated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the ways to measure centrality?

A

Mode
–> nominal

Mean
–> interval/ratio (continuous) –> no outliers

Median

  • -> interval/ratio (continuous) –> with outliers
  • -> ordinal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you find the mode?

A

Count how many times each value appears in the data set and choose the one that occurs the most.

Strawberry: 15 = the mode
Chocolate: 10
Pistachio: 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you find the mean?

What is the formula?

A

• Add up all the values and divide them by the number of values.

X(bar) or u = ∑x/n

X(bar on top) or u = the mean
∑ = sum up 
x = a value
∑x = sum of all the values
n = the number of values

Conclusion:
Mean = sum of values/ number of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you find the median?

FACT: the median is the middle value

A

Unequal numbers of values (9)

  1. order all values from low to high
  2. count the number of values
  3. split the number of values in 2 and round it upwards
  4. count from the beginning to the value from step.3 to find the middle value

Equal numbers of values (8)
1. The median is the mean (average) of the middle two values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

FACT: when equal numbers of values –>

The median is the mean of the middle two values

A

-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is spread (dispersion)?

A

Spread designates how much data values differ from each other and from the measure of centre.

Short:
How much values differ:
• from each other
• from the measure of centre (mode, mean, median)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 4 measures of spread?

A
  1. Range (Inter quartile range)
  2. Mean absolute deviation (MAD)
  3. Variance
  4. Standard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is range?

FACT: range is highly influenced by outliers

A

Range is the difference between the highest value and the lowest value.

Range = maximum - minumum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are quartiles?

What is the Interquartile range?

FACT: IQR good to use when there are outliers.

A

Quartiles =
Are the numbers that split your data into four equal parts (25% of data per part)

If equal numbers:
Calculate the mean between two values

Interquartile range =
The middle 50% of the data.

Q3-Q1 = IQR (the middle 50%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Mean Absolute Deviation?

What are the disadvantages of the MAD?

A

It calculates the average absolute deviations
(deviation = difference between the mean and the value)

Disadvantages =

  1. Mathematical difficult to optimise
  2. Not enough emphasis on extreme values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is variance?

What are the disadvantages of variance?

What do you use for the calculation?

A

The measure of spread that looks at squared deviation.
The variance is the mean squared deviation from the values to the mean.

–> When the variance is high, the spread is also high
(because in the formula you subtract the mean from the value)

Disadvantages =
• The unit of the variance is different from the unit of the variable (interpretation of unit)

Calculation:
For the variance we use the mean as the measure of centre, because the mean included every value of the data.

  1. Mean
  2. Value - mean
  3. square the differences
  4. add the squared differences –> SS (sum of squares)
  5. SS/ number of values/observations –> variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

FACT: standard deviation is the STANDARD measure of spread.

Why?

A
  • It is mathematically nice to work with squared differences in optimisation
  • Squared differences give more emphasis to extreme values
  • Easy to interpret because the unit of the standard deviation is the same as the unit of the ordinal variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you calculate the Standard Deviation?

A

All steps from Variance

  1. Mean
  2. Value - mean
  3. square the differences
  4. add the squared differences –> SS (sum of squares)
  5. SS/ number of values/observations –> variance

+
Standard Deviation = the root of variance
Root of –> SS/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is correlation?

A

Correlation is a standardised measure of the strength of the linear relationship between two variables.

When a change in one variable LEADS to CHANGE in another variable.

  • -> doesn’t have an order
  • -> one does not cause the other
17
Q

What is a causation?

A

Means that one variable CAUSES the other variable to change.

(one directly changes the other)

In order for there to be a causation a change in the one variable has directly change the other variable.

18
Q

Correlation Coefficient = Pearsons R

A

[ -1 , 1 ]

Correlation is a standardised measure of the strength of the linear relationship between two variables.

The closer to zero, the weaker the correlation.

A high positive correlation means that when one variable increase, the other also increases.

A high negative correlation means that when one variable increases, the other one decreases.

A correlation of 0 means that when one variable increases, that has no linear influence on the other.

–> a correlation of 0 does not mean that there is no relationship between the 2 variables, it could be a non-linear relationship.