Statistics Flashcards by Himanshu JOSHI

It is normal practice to plot the dependent variable
on the … axis and the independent variable on the …
axis.

It is normal practice to plot the
dependent variable on the vertical axis
and the independent variable on the
horizontal axis.

How well did you know this?

Not at all

Perfectly

In order for there to be correlation, the data in a
scatter graph should be approximately…

elliptical

How well did you know this?

Not at all

Perfectly

What is the difference between association and
correlation?

Association refers to any relationship
between two variables, whereas
correlation often just refers to a linear
relationship between two variables.

How well did you know this?

Not at all

Perfectly

It is appropriate to use the product moment
correlation coefficient when…

The data is random on random, and
the underlying parent population
follows a bivariate normal
distribution.

How well did you know this?

Not at all

Perfectly

The regression line will always pass through the
point…

(x̅, y̅)

How well did you know this?

Not at all

Perfectly

For data that is random on non-random, the
regression line we must use is…

y on x

How well did you know this?

Not at all

Perfectly

For data that is random on random, the regression
lines we can use are…

y on x or x on y

How well did you know this?

Not at all

Perfectly

If a value of y is to be estimated from a value of x,
then the regression line of … must be used.

y on x

How well did you know this?

Not at all

Perfectly

If a value of x is to be estimated from a value of y,
then the regression line of … must be used.

x on y

How well did you know this?

Not at all

Perfectly

If y = a + bx is the regression line for y on x, then
the residual εifor (xi, yi) is…

εi = yi − (a + bxi)

How well did you know this?

Not at all

Perfectly

Sum of the residuals ε1 + ε2 + ⋯ εn =

How well did you know this?

Not at all

Perfectly

The Coefficient of Determination is r^2,
where 0 ≤ r^2 ≤ 1. It tells us…

the proportion of the variation in y
that is explained by the variation in x.

How well did you know this?

Not at all

Perfectly

What is the first statement in a PMCC hypothesis test?

Let ρ be the population correlation
coefficient between x and y (in context)

How well did you know this?

Not at all

Perfectly

What is the null hypothesis of a PMCC hypothesis test?

H0: ρ = 0

How well did you know this?

Not at all

Perfectly

What could be the alternative hypothesis of a PMCC hypothesis test?

H1: ρ < 0
H1: ρ > 0
H1: ρ ≠ 0

How well did you know this?

Not at all

Perfectly

Why might effect sizes be used instead of conducting a hypothesis test?

For a large set of random on random
bivariate data, a small non-zero value
of the PMCC is likely to lead to a
rejection of the null hypothesis of no
correlation in the population. This is
uninformative.
In some contexts it is more important
to consider the size of the correlation
rather than test whether the
population correlation is non-zero. The
effect size can be used to describe the
PMCC.

How well did you know this?

Not at all

Perfectly

In Cohen’s interpretation for interpreting effect size,
what correlation coefficients represent a small, medium and large effect size?

0.1, 0.3, 0.5

How well did you know this?

Not at all

Perfectly

Why might we use Spearman’s rank correlation
coefficient, rather than the PMCC?

Study These Flashcards

When the data is…
* Non-linear
* Not random on random
* Subjective
* The data is not roughly elliptical

Do we need to introduce ρ for Spearman’s rank hypothesis test?

Study These Flashcards

No as there is not necessarily an
underlying parent population.

What is the null hypothesis of a Spearman’s rank hypothesis test?

Study These Flashcards

H0: There is no association in the
population between x and y (in

context)

What could be the alternative hypothesis of a
Spearman’s rank hypothesis test?

Study These Flashcards

H1: There is some positive association
in the population between x and y (in context)

H1: There is some negative association
in the population between x and y (in context)

H1: There is some association between
in the population x and y (in context)

Why should a random sample be taken for a hypothesis test?

Study These Flashcards

A random sample enables proper
inference about the population to be

undertaken

In the expected frequency table of a Chi-Squared
Contingency Table test, an expected value =

Study These Flashcards

(Row total × Column total) / Sample size

What is the null hypothesis of a Chi-Squared Contingency Table test?

Study These Flashcards

H0: There is no association between x and y (in context)

What is the alternative hypothesis of a Chi-Squared Contingency Table test?

H1: There is an association between x and y (in context)

What is the number of degrees of freedom for Chi- Squared Contingency Table test, if the number of rows is m and the number of columns is n?

ν = (m − 1) × (n − 1)

When would you need to combine categories in a contingency table before performing a hypothesis test?

If any of the expected frequencies are less than 5

For a discrete random variable X, where a and b are constants... E(aX + b) =

aE(X) + b

Var(aX + b) =

a^2Var(X)

For any two discrete random variables X and Y... E(aX ± bY) =

aE(X) ± bE(Y)

For two discrete random variables X and Y... Var(aX ± bY) = if X and Y are...

If X and Y are independent, then a^2Var(X) + b^2Var(Y)

For X~B(n, p) Var(X) =

np(1 − p)

State the conditions for a situation to be modelled using the binomial distribution.

* There are only two possible outcomes (success or failure) * The probability of success, p, is constant * The trials are independent of each other * The number of trials, n, is fixed

State the conditions for a situation to be modelled using the geometric distribution.

* There are only two possible outcomes (success or failure) * The probability of success, p, is constant * The trials are independent of each other

For X~Geo(p) P(X > r) =

(1 − p)^r

For X~Geo(p) P(X ≤ r) =

1 − (1 − p)^r

For X~Po(λ) E(X) =

For X~Po(λ) Var(X) =

An indicator of whether a Poisson distribution may be able to model a data set is if...

The sample mean and sample variance of the data set are close to one another.

State the conditions for a situation to be modelled using the Poisson distribution.

* The events occur randomly, and are all independent of each other. * The events happen singly (i.e. “one at a time”) * The events happen (on average) at a constant rate (λ).

If X and Y are two independent Poisson random variables with means λ and μ respectively, then X + Y~

Po(λ + μ)

A binomial distribution with parameters n and p can be approximated by a Poisson distribution, with parameter λ = np, if...

n is large and p is small (and so the event is rare).

For a goodness of fit test, the degrees of freedom ν =

Final number of columns (after combining any columns due to low expected frequencies) −1 for each estimated parameter (e.g. λ for Poisson) −1 as the total frequency is one restriction.

If your test statistic lies within the left-hand critical region...

* Perhaps the model was constructed to fit a set of data. * Perhaps some of the data has been omitted in order to produce a better fit. * Perhaps some of the data is not genuine.

Statistics Flashcards

(44 cards)