Statistics Flashcards
It is normal practice to plot the dependent variable
on the … axis and the independent variable on the …
axis.
It is normal practice to plot the
dependent variable on the vertical axis
and the independent variable on the
horizontal axis.
In order for there to be correlation, the data in a
scatter graph should be approximately…
elliptical
What is the difference between association and
correlation?
Association refers to any relationship
between two variables, whereas
correlation often just refers to a linear
relationship between two variables.
It is appropriate to use the product moment
correlation coefficient when…
The data is random on random, and
the underlying parent population
follows a bivariate normal
distribution.
The regression line will always pass through the
point…
(x̅, y̅)
For data that is random on non-random, the
regression line we must use is…
y on x
For data that is random on random, the regression
lines we can use are…
y on x or x on y
If a value of y is to be estimated from a value of x,
then the regression line of … must be used.
y on x
If a value of x is to be estimated from a value of y,
then the regression line of … must be used.
x on y
If y = a + bx is the regression line for y on x, then
the residual εifor (xi, yi) is…
εi = yi − (a + bxi)
Sum of the residuals ε1 + ε2 + ⋯ εn =
0
The Coefficient of Determination is r^2,
where 0 ≤ r^2 ≤ 1. It tells us…
the proportion of the variation in y
that is explained by the variation in x.
What is the first statement in a PMCC hypothesis test?
Let ρ be the population correlation
coefficient between x and y (in context)
What is the null hypothesis of a PMCC hypothesis test?
H0: ρ = 0
What could be the alternative hypothesis of a PMCC hypothesis test?
H1: ρ < 0
H1: ρ > 0
H1: ρ ≠ 0
Why might effect sizes be used instead of conducting a hypothesis test?
For a large set of random on random
bivariate data, a small non-zero value
of the PMCC is likely to lead to a
rejection of the null hypothesis of no
correlation in the population. This is
uninformative.
In some contexts it is more important
to consider the size of the correlation
rather than test whether the
population correlation is non-zero. The
effect size can be used to describe the
PMCC.
In Cohen’s interpretation for interpreting effect size,
what correlation coefficients represent a small, medium and large effect size?
0.1, 0.3, 0.5
Why might we use Spearman’s rank correlation
coefficient, rather than the PMCC?
When the data is…
* Non-linear
* Not random on random
* Subjective
* The data is not roughly elliptical
Do we need to introduce ρ for Spearman’s rank hypothesis test?
No as there is not necessarily an
underlying parent population.
What is the null hypothesis of a Spearman’s rank hypothesis test?
H0: There is no association in the
population between x and y (in
context)
What could be the alternative hypothesis of a
Spearman’s rank hypothesis test?
H1: There is some positive association
in the population between x and y (in context)
H1: There is some negative association
in the population between x and y (in context)
H1: There is some association between
in the population x and y (in context)
Why should a random sample be taken for a hypothesis test?
A random sample enables proper
inference about the population to be
undertaken
In the expected frequency table of a Chi-Squared
Contingency Table test, an expected value =
(Row total × Column total) / Sample size
What is the null hypothesis of a Chi-Squared Contingency Table test?
H0: There is no association between x and y (in context)
What is the alternative hypothesis of a Chi-Squared Contingency Table test?
H1: There is an association between x
and y (in context)
What is the number of degrees of freedom for Chi-
Squared Contingency Table test, if the number of rows is m and the number of columns is n?
ν = (m − 1) × (n − 1)
When would you need to combine categories in a
contingency table before performing a hypothesis test?
If any of the expected frequencies are
less than 5
For a discrete random variable X, where a and b are constants…
E(aX + b) =
aE(X) + b
Var(aX + b) =
a^2Var(X)
For any two discrete random variables X and Y…
E(aX ± bY) =
aE(X) ± bE(Y)
For two discrete random variables X and Y…
Var(aX ± bY) =
if X and Y are…
If X and Y are independent, then
a^2Var(X) + b^2Var(Y)
For X~B(n, p)
Var(X) =
np(1 − p)
State the conditions for a situation to be modelled using the binomial distribution.
- There are only two possible
outcomes (success or failure) - The probability of success, p, is
constant - The trials are independent of
each other - The number of trials, n, is fixed
State the conditions for a situation to be modelled using the geometric distribution.
- There are only two possible
outcomes (success or failure) - The probability of success, p, is
constant - The trials are independent of
each other
For X~Geo(p)
P(X > r) =
(1 − p)^r
For X~Geo(p)
P(X ≤ r) =
1 − (1 − p)^r
For X~Po(λ)
E(X) =
λ
For X~Po(λ)
Var(X) =
λ
An indicator of whether a Poisson distribution may be able to model a data set is if…
The sample mean and sample variance
of the data set are close to one another.
State the conditions for a situation to be modelled using the Poisson distribution.
- The events occur randomly,
and are all independent of each
other. - The events happen singly (i.e.
“one at a time”) - The events happen (on average)
at a constant rate (λ).
If X and Y are two independent Poisson random
variables with means λ and μ respectively, then X + Y~
Po(λ + μ)
A binomial distribution with parameters n and p can
be approximated by a Poisson distribution, with parameter λ = np, if…
n is large and p is small (and so the
event is rare).
For a goodness of fit test, the degrees of freedom ν =
Final number of columns (after
combining any columns due to low
expected frequencies) −1 for each
estimated parameter (e.g. λ for
Poisson) −1 as the total frequency is
one restriction.
If your test statistic lies within the left-hand critical region…
- Perhaps the model was
constructed to fit a set of data. - Perhaps some of the data has
been omitted in order to
produce a better fit. - Perhaps some of the data is not
genuine.