Probability, Correlation And Hypothesis Testing Flashcards

1
Q

Comparative pie charts formula

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Outliers formula

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Comparative pie charts

A

The ratio of the sample size is the same as the ratio of the areas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Population mean

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sample mean

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

‘Sum of’

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The sample mean when xi occurs with a frequency fi

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is discrete data?

A

Data that can only take certain values which are often integers but sometimes aren’t , for example shoe size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is continuous data?

A

Can take any numerical value such as height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the range?

A

Highest value - lowest value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is IQR?

A

Q3 - Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Standard deviation formulas

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Variance formulas

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is probability?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a set?

A

A collection of numbers which cannot have repeats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a subset?

A

All the elements in ‘A’ are in ‘S’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is an empty set?

A

An imaginary set with no elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a sample space?

A

All the possible outcomes of a random experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Complement of A

A

A’ (not A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

B is a subset of A

A

If B occurs so does A

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Mutually exclusive

A

The occurrence of one event excludes the possibility that any other events could occur (they cannot happen at the same time)
If A and B are exclusive the probability of A or B occurring is the probability of the sum of AUB

P(AUBUC) = P(A) +P(B) +P(C)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Independent events

A

The probability of event A occurring is unaffected by whether or not B occurs
If A and B are independent then P(AnB) = P(A) x P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

The addition law of probability

A
24
Q

Multiplication law

A
25
Q

What is Pearson’s Product Moment Correlation Coefficient

A

The PMCC is denoted by R and named after Pearson, an applied mathematician who worked on the application of statistics to genetics evolution

26
Q

PMCC formulas

A
27
Q

Interpreting PMCC values

A

R = 1 perfect positive correlation
R = -1 perfect negative correlation
R = 0 no linear correlation

28
Q

What does a measure of correlation indicate?

A

A relationship between the two values however, it does not indicate a causal relationship

29
Q

Spearman’s correlation coefficient formula

A
30
Q

Spearman’s

A

Makes no assumptions about the original data and the original data does not need to be linear

31
Q

PMCC

A

We can only do a hypothesis test here if the variables are jointly normally distributed

32
Q

H0 and H1

A

H0: null hypothesis (no correlation)
H1: correlation

33
Q

Hypothesis testing

A
34
Q

What is a regression line?

A

It should intersect the double mean point and should be linear for bivariate data
The equation for the linear regression line is given as:
Y = ax + b
Where a is the gradient and b is the y intercept
X is the independent value and y is the dependent

35
Q

Things to consider when analysing the regression model

A

How do we interpret the model
How can we interpret in context the coefficient of x
How can we interpret in context the constant term

36
Q

What is a residual?

A

An error the model produces when trying to predict a data point
It is the distance between the data point and regression line
For y on x regression it is only sensible to consider predictions for y

37
Q

How to calculate a residual?

A
38
Q

What does a positive residual indicate?

A

Where the model is giving an underprediction

39
Q

What does a negative residual indicate?

A

An overprediction

40
Q

What should we see when we plot predicted vs actual?

A

Strong positive correlation

41
Q

What should we see when we plot predicted vs residual?

A

A uniform distribution clustered around zero with no patterns

42
Q

Anscombe’s quartet

A

Each data set has the same summary statistics and are clearly different

43
Q

Unstructured statistics

A

Each data set has the same summary statistics but they are visually different

44
Q

The normal distribution diagram

A
45
Q

The normal distribution formula

A
46
Q

What is the z-value?

A

The number of standard deviations a value is above/below the mean
Because the normal distribution is symmetrical we can use the positive z-value to calculate the negative

47
Q

We can only use the z-table when…

A

The z-value is positive (on the right of the graph)
We’re finding the probability to the left of this z-value

48
Q

Changing the direction of the inequality

A

Changing the sign or direction of the inequality does ‘1-‘
If we do both they cancel out

49
Q

Standardising formula

A
50
Q

To find the z score?

A
51
Q

To find the z value for a probability?

A

Use the z table backwards
Find the value on the table and work backwards

52
Q

Central limit theorem

A

If we continually take samples of the same size and record their corresponding sample means, they themselves will be normally distributed around the known population mean

53
Q

How is the sample mean normally distributed

A
54
Q

Standard deviation

A
55
Q

Continuity corrections

A

We can convert discrete data to continuous

56
Q

Approximating

A

To approximate a binomial distribution as a normal we can copy over the mean and variance of the binomial
We must change the letter as it is a different distribution