Stats Flashcards

1
Q

Bivariate Definition

A
  • It means data with two variables, e.g. a graph of life expectancy plotted against birth rate for a country.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Dependant Vs Independent Variables (with the example of weight of crop yielded vs amount of rainfall)

A
  • The Dependant Factor’s outcome is reliant on the Independent Variable.
  • Obviously, the weight of crop yielded is dependant on the amount of rainfall and not the other way around.
  • Therefore, the weight of crop yielded is dependant, the amount of rainfall is independent.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Random, Non-Random, and Control Variables

A
  • Random variables cannot be predicted, they are independent from anything else.
  • Control variables are non-random, and they are changed at regular intervals of your choice. (e.g. time)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Correlation and Linear

A
  • Correlation is essentially how close to a straight line points of data are.
  • Linear just means in a straight line.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Correlation Vs Association

A
  • Correlation (linear association) is about how close data is to lying on a straight line (strictly linear).
  • Association is about how closely related two variables of data are.
    -Therefore, correlation is a type of association, as it describes how close two variables of data are (to being linear).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

PMCC

A
  • Stored as the value R, used to describe the strength of correlation (between -1 to 1).

No Correlation: Between -0.1 to 0.1
Perfect Negative/Positive Correlation: -1 / 1

Weak Negative Correlation: -0.1 to -0.5
Moderate Negative Correlation: -0.5 to -0.8
Strong Negative Correlation: -0.8 to -1

Weak Positive Correlation: 0.1 to 0.5
Moderate Positive Correlation: 0.5 to 0.8
Strong Positive Correlation: 0.8 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you know if data has a normal distribution?

A
  • It has a roughly elliptical shape on a scatter graph.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Random Variable Definition

A
  • A variable whose value is a numerical outcome of a random phenomenon.
  • Denoted with a capital letter, e.g X
  • The probability distribution of X tells us the possible values of X.
  • Can be discrete or continuous.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cohen’s Interpretation for interpreting effect size (PMCC)

A

Small Effect Size (r ≈ 0.1):
The relationship between two variables is weak.
Medium Effect Size (r ≈ 0.3):
The relationship between two variables is moderate.
Large Effect Size (r ≈ 0.5):
The relationship between two variables is strong.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Scenarios where we would use Spearman’s Rank over PMCC

A
  • Deals with subjective data.
  • Deals with non-numerical data (e.g. A, B, C)
  • Deals with non-linear corelation unlike PMCC.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Interpreting Spearman’s Rank Correlation Coefficient

A
  • It is interpreted in the same as PMCC, from -1 to 1.
  • 0.8 for example, would represent a strong positive association (not correlation).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to double check spearman’s rank

A
  • After making the two rows for ranks, so long as they are correct, inputting those values and finding the PMCC should give you the same coefficient.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do you do multiple pieces of data has the same rank?

A
  • Average the ranks that they would take if they were different values.
  • If you have 7, 7, 7 at the start of the rankings, they would occupy the 1st, 2nd, and 3rd rank if they were different values.
  • Add the ranks, and average, 1 + 2 + 3 = 6/3 = 2, therefore they are all given the rank 2.
  • The next piece of data is given the next rank if they were all different values, so here 4th.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Issue with PMCC hypothesis testing

A
  • When using large data samples, the critical value is so low that even an incredibly small PMCC suggests a correlation, when the correlation in reality is too weak to consider.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Example of two tailed PMCC hypothesis test

A
  • Let p (rho) be the population correlation coefficient between exam scores and heights.
    H0: p = 0
    H1: p != 0
  • Find critical value in table.
  • Compare PMCC and value (it must be greater than if positive, or less than if negative to reject H0).
    Example, -0.35 > -0.7:
    Result is not significant, so we fail to reject H0/so we reject H0, as there is insufficient/there is evidence to suggest that there is correlation between X and Y.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Example of spearman’s rank hypothesis test

A

H0: There is no association between X and Y.
{H1: There is some positive association between X and Y.
H1: There is some negative association between X and Y.
H1: There is some association between X and Y.}

Obtain critical value from table.
There is evidence to suggest ___.

17
Q

How to get the chi squared statistic

A
  • Using the initial table, add a final row and column for the total.
  • Create a new table for the expected frequency for each, which can be calculated:
    Row Total x Column Total / Sample Size
  • Make a final ‘contributions’ table, where you calculate the chi test statistic for each as so:
    (original frequency - expected frequency)^2 / expected frequency.
  • Add them all together and that is the chi-squared statistic.
18
Q

Example of chi squared contingency hypothesis test

A

H0: There is no association between X and Y.
H1: There is some association between X and Y.
- Calculate degrees of freedom and find critical value.
- Compare critical value against.
If critical value is lo, drop the h0.
- Result is/is not significant (X^2 > c.v. is significant), so reject H0).
- There is evidence to suggest there is an association between X and Y.

19
Q

How to calculate the degrees of freedom

A

(no. of rows - 1) x (no. of columns - 1)
a 3x3 table has 2 x 2 = 4 degrees of freedom

20
Q

How to calculate expected value from discrete random variable table

A
  • Add together the sum of P(X = r) x r
    r = 2, 3, 4
    P(X = r) = 0.2, 0.4, 0.4

(2 x 0.2) + (3 x 0.4) + (4 x 0.4) = 3.2

If you repeated this spinner many many times, you would expect the average of all the values would be 3.2

21
Q

How to calculate variance from discrete random variance table

A

Var(X) = E(X^2) - [E(X)]^2

22
Q

What is E(X^2)

A
  • This is when you calculate the expected frequency using r^2 not r.
  • Add together the sum of P(X = r) x r^2.
23
Q

E(X) & Var(X): Discrete Uniform Distribution

A

n + 1 / 2
n^2 - 1 / 12

24
Q

The Binomial Distribution

A
  • a fixed number of n trials.
  • the outcome of each trial is independent
  • constant probability of success.
  • two outcomes only, success/failure.

if these conditions are met, a random variable X whose outcome is the number of successes is binomially distributed as: (n is no. of trials, p is probability of success).
X〰B (n, p)

25
Q

E(X) & Var(X): Binomial Distribution

A

If X〰B (n, p)
E(X) = np
Var(X) = np(1-p)

26
Q

Poissson Distribution

A
  • Must occur randomly and independently.
  • Events occur singly (one at a time).
  • Events happen on average at a constant rate, λ.
  • Can be used to describe real life scenarios, e.g. no. of cars passing in a minute, number of misprints that occur on a page in a book.
  • X〰Po (λ)
  • λ represents average rate over fixed time, or space.
27
Q

E(X) & Var(X): Poisson Distribution

A

E(X) = λ
Var(X) = λ

28
Q

Formula For Probability P (A | B)

A

P (A | B)

P(A n B) /
P(B)

29
Q

How to approximate a binomial distribution from a poisson distribution

A

X〰Po ( λ = np)
The larger n, and the smaller p, the less % error and better the approximation,

30
Q

Geometric Distribution

A

X〰Geo (p)
- Similar to binomial, we need fixed probability of success, is must be independent, and there must only be two possible outcomes.
- Remember after winning we STOP, there is no need to continue adding probabilities after you have won.

31
Q

Linear Regression

A
  • Y on x minimises Y (vertical distance).
  • X on y minimises X (horizontal distances).
  • Y on X, is when you have an X value and want to predict Y value, for non-random X axis.
  • X on Y, the opposite.
32
Q

Goodness of fit test hypothesis

A

H0: the uniform model is a good fit

H0: the uniform model is not a good fit