- It means data with two variables, e.g. a graph of life expectancy plotted against birth rate for a country.

Stats Flashcards by Bob Jones

Bivariate Definition

It means data with two variables, e.g. a graph of life expectancy plotted against birth rate for a country.

How well did you know this?

Not at all

Perfectly

Dependant Vs Independent Variables (with the example of weight of crop yielded vs amount of rainfall)

The Dependant Factor’s outcome is reliant on the Independent Variable.
Obviously, the weight of crop yielded is dependant on the amount of rainfall and not the other way around.
Therefore, the weight of crop yielded is dependant, the amount of rainfall is independent.

How well did you know this?

Not at all

Perfectly

Random, Non-Random, and Control Variables

Random variables cannot be predicted, they are independent from anything else.
Control variables are non-random, and they are changed at regular intervals of your choice. (e.g. time)

How well did you know this?

Not at all

Perfectly

Correlation and Linear

Correlation is essentially how close to a straight line points of data are.
Linear just means in a straight line.

How well did you know this?

Not at all

Perfectly

Correlation Vs Association

Correlation (linear association) is about how close data is to lying on a straight line (strictly linear).
Association is about how closely related two variables of data are.
-Therefore, correlation is a type of association, as it describes how close two variables of data are (to being linear).

How well did you know this?

Not at all

Perfectly

PMCC

Stored as the value R, used to describe the strength of correlation (between -1 to 1).

No Correlation: Between -0.1 to 0.1
Perfect Negative/Positive Correlation: -1 / 1

Weak Negative Correlation: -0.1 to -0.5
Moderate Negative Correlation: -0.5 to -0.8
Strong Negative Correlation: -0.8 to -1

Weak Positive Correlation: 0.1 to 0.5
Moderate Positive Correlation: 0.5 to 0.8
Strong Positive Correlation: 0.8 to 1

How well did you know this?

Not at all

Perfectly

How do you know if data has a normal distribution?

It has a roughly elliptical shape on a scatter graph.

How well did you know this?

Not at all

Perfectly

Random Variable Definition

A variable whose value is a numerical outcome of a random phenomenon.
Denoted with a capital letter, e.g X
The probability distribution of X tells us the possible values of X.
Can be discrete or continuous.

How well did you know this?

Not at all

Perfectly

Cohen’s Interpretation for interpreting effect size (PMCC)

Small Effect Size (r ≈ 0.1):
The relationship between two variables is weak.
Medium Effect Size (r ≈ 0.3):
The relationship between two variables is moderate.
Large Effect Size (r ≈ 0.5):
The relationship between two variables is strong.

How well did you know this?

Not at all

Perfectly

Scenarios where we would use Spearman’s Rank over PMCC

Deals with subjective data.
Deals with non-numerical data (e.g. A, B, C)
Deals with non-linear corelation unlike PMCC.

How well did you know this?

Not at all

Perfectly

Interpreting Spearman’s Rank Correlation Coefficient

It is interpreted in the same as PMCC, from -1 to 1.
0.8 for example, would represent a strong positive association (not correlation).

How well did you know this?

Not at all

Perfectly

How to double check spearman’s rank

After making the two rows for ranks, so long as they are correct, inputting those values and finding the PMCC should give you the same coefficient.

How well did you know this?

Not at all

Perfectly

What do you do multiple pieces of data has the same rank?

Average the ranks that they would take if they were different values.
If you have 7, 7, 7 at the start of the rankings, they would occupy the 1st, 2nd, and 3rd rank if they were different values.
Add the ranks, and average, 1 + 2 + 3 = 6/3 = 2, therefore they are all given the rank 2.
The next piece of data is given the next rank if they were all different values, so here 4th.

How well did you know this?

Not at all

Perfectly

Issue with PMCC hypothesis testing

When using large data samples, the critical value is so low that even an incredibly small PMCC suggests a correlation, when the correlation in reality is too weak to consider.

How well did you know this?

Not at all

Perfectly

Example of two tailed PMCC hypothesis test

Let p (rho) be the population correlation coefficient between exam scores and heights.
H0: p = 0
H1: p != 0
Find critical value in table.
Compare PMCC and value (it must be greater than if positive, or less than if negative to reject H0).
Example, -0.35 > -0.7:
Result is not significant, so we fail to reject H0/so we reject H0, as there is insufficient/there is evidence to suggest that there is correlation between X and Y.

How well did you know this?

Not at all

Perfectly

Example of spearman’s rank hypothesis test

Study These Flashcards

H0: There is no association between X and Y.
{H1: There is some positive association between X and Y.
H1: There is some negative association between X and Y.
H1: There is some association between X and Y.}

Obtain critical value from table.
There is evidence to suggest ___.

How to get the chi squared statistic

Study These Flashcards

Using the initial table, add a final row and column for the total.
Create a new table for the expected frequency for each, which can be calculated:
Row Total x Column Total / Sample Size
Make a final ‘contributions’ table, where you calculate the chi test statistic for each as so:
(original frequency - expected frequency)^2 / expected frequency.
Add them all together and that is the chi-squared statistic.

Example of chi squared contingency hypothesis test

Study These Flashcards

H0: There is no association between X and Y.
H1: There is some association between X and Y.
- Calculate degrees of freedom and find critical value.
- Compare critical value against.
If critical value is lo, drop the h0.
- Result is/is not significant (X^2 > c.v. is significant), so reject H0).
- There is evidence to suggest there is an association between X and Y.

How to calculate the degrees of freedom

Study These Flashcards

(no. of rows - 1) x (no. of columns - 1)
a 3x3 table has 2 x 2 = 4 degrees of freedom

How to calculate expected value from discrete random variable table

Study These Flashcards

Add together the sum of P(X = r) x r
r = 2, 3, 4
P(X = r) = 0.2, 0.4, 0.4

(2 x 0.2) + (3 x 0.4) + (4 x 0.4) = 3.2

If you repeated this spinner many many times, you would expect the average of all the values would be 3.2

How to calculate variance from discrete random variance table

Study These Flashcards

Var(X) = E(X^2) - [E(X)]^2

What is E(X^2)

Study These Flashcards

This is when you calculate the expected frequency using r^2 not r.
Add together the sum of P(X = r) x r^2.

E(X) & Var(X): Discrete Uniform Distribution

Study These Flashcards

n + 1 / 2
n^2 - 1 / 12

The Binomial Distribution

Study These Flashcards

a fixed number of n trials.
the outcome of each trial is independent
constant probability of success.
two outcomes only, success/failure.

if these conditions are met, a random variable X whose outcome is the number of successes is binomially distributed as: (n is no. of trials, p is probability of success).
X〰B (n, p)

E(X) & Var(X): Binomial Distribution

If X〰B (n, p) E(X) = np Var(X) = np(1-p)

Poissson Distribution

- Must occur randomly and independently. - Events occur singly (one at a time). - Events happen on average at a constant rate, λ. - Can be used to describe real life scenarios, e.g. no. of cars passing in a minute, number of misprints that occur on a page in a book. - X〰Po (λ) - λ represents average rate over fixed time, or space.

E(X) & Var(X): Poisson Distribution

E(X) = λ Var(X) = λ

Formula For Probability P (A | B)

P (A | B) = P(A n B) / P(B)

How to approximate a binomial distribution from a poisson distribution

X〰Po ( λ = np) The larger n, and the smaller p, the less % error and better the approximation,

Geometric Distribution

X〰Geo (p) - Similar to binomial, we need fixed probability of success, is must be independent, and there must only be two possible outcomes. - Remember after winning we STOP, there is no need to continue adding probabilities after you have won.

Linear Regression

- Y on x minimises Y (vertical distance). - X on y minimises X (horizontal distances). - Y on X, is when you have an X value and want to predict Y value, for non-random X axis. - X on Y, the opposite.

Goodness of fit test hypothesis

H0: the uniform model is a good fit H0: the uniform model is not a good fit

Stats Flashcards

(32 cards)