Statistics Flashcards

1
Q

It is normal practice to plot the dependent variable
on the … axis and the independent variable on the …
axis.

A

It is normal practice to plot the
dependent variable on the vertical axis
and the independent variable on the
horizontal axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In order for there to be correlation, the data in a
scatter graph should be approximately…

A

elliptical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between association and
correlation?

A

Association refers to any relationship
between two variables, whereas
correlation often just refers to a linear
relationship between two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

It is appropriate to use the product moment
correlation coefficient when…

A

The data is random on random, and
the underlying parent population
follows a bivariate normal
distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The regression line will always pass through the
point…

A

(x̅, y̅)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For data that is random on non-random, the
regression line we must use is…

A

y on x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

For data that is random on random, the regression
lines we can use are…

A

y on x or x on y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If a value of y is to be estimated from a value of x,
then the regression line of … must be used.

A

y on x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If a value of x is to be estimated from a value of y,
then the regression line of … must be used.

A

x on y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If y = a + bx is the regression line for y on x, then
the residual εifor (xi, yi) is…

A

εi = yi − (a + bxi)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Sum of the residuals ε1 + ε2 + ⋯ εn =

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The Coefficient of Determination is r^2,
where 0 ≤ r^2 ≤ 1. It tells us…

A

the proportion of the variation in y
that is explained by the variation in x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the first statement in a PMCC hypothesis test?

A

Let ρ be the population correlation
coefficient between x and y (in context)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the null hypothesis of a PMCC hypothesis test?

A

H0: ρ = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What could be the alternative hypothesis of a PMCC hypothesis test?

A

H1: ρ < 0
H1: ρ > 0
H1: ρ ≠ 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why might effect sizes be used instead of conducting a hypothesis test?

A

For a large set of random on random
bivariate data, a small non-zero value
of the PMCC is likely to lead to a
rejection of the null hypothesis of no
correlation in the population. This is
uninformative.
In some contexts it is more important
to consider the size of the correlation
rather than test whether the
population correlation is non-zero. The
effect size can be used to describe the
PMCC.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

In Cohen’s interpretation for interpreting effect size,
what correlation coefficients represent a small, medium and large effect size?

A

0.1, 0.3, 0.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why might we use Spearman’s rank correlation
coefficient, rather than the PMCC?

A

When the data is…
* Non-linear
* Not random on random
* Subjective
* The data is not roughly elliptical

19
Q

Do we need to introduce ρ for Spearman’s rank hypothesis test?

A

No as there is not necessarily an
underlying parent population.

20
Q

What is the null hypothesis of a Spearman’s rank hypothesis test?

A

H0: There is no association in the
population between x and y (in

context)

21
Q

What could be the alternative hypothesis of a
Spearman’s rank hypothesis test?

A

H1: There is some positive association
in the population between x and y (in context)

H1: There is some negative association
in the population between x and y (in context)

H1: There is some association between
in the population x and y (in context)

22
Q

Why should a random sample be taken for a hypothesis test?

A

A random sample enables proper
inference about the population to be

undertaken

23
Q

In the expected frequency table of a Chi-Squared
Contingency Table test, an expected value =

A

(Row total × Column total) / Sample size

24
Q

What is the null hypothesis of a Chi-Squared Contingency Table test?

A

H0: There is no association between x and y (in context)

25
Q

What is the alternative hypothesis of a Chi-Squared Contingency Table test?

A

H1: There is an association between x
and y (in context)

26
Q

What is the number of degrees of freedom for Chi-
Squared Contingency Table test, if the number of rows is m and the number of columns is n?

A

ν = (m − 1) × (n − 1)

27
Q

When would you need to combine categories in a
contingency table before performing a hypothesis test?

A

If any of the expected frequencies are

less than 5

28
Q

For a discrete random variable X, where a and b are constants…
E(aX + b) =

A

aE(X) + b

29
Q

Var(aX + b) =

A

a^2Var(X)

30
Q

For any two discrete random variables X and Y…
E(aX ± bY) =

A

aE(X) ± bE(Y)

31
Q

For two discrete random variables X and Y…
Var(aX ± bY) =
if X and Y are…

A

If X and Y are independent, then
a^2Var(X) + b^2Var(Y)

32
Q

For X~B(n, p)
Var(X) =

A

np(1 − p)

33
Q

State the conditions for a situation to be modelled using the binomial distribution.

A
  • There are only two possible
    outcomes (success or failure)
  • The probability of success, p, is
    constant
  • The trials are independent of
    each other
  • The number of trials, n, is fixed
34
Q

State the conditions for a situation to be modelled using the geometric distribution.

A
  • There are only two possible
    outcomes (success or failure)
  • The probability of success, p, is
    constant
  • The trials are independent of
    each other
35
Q

For X~Geo(p)
P(X > r) =

A

(1 − p)^r

36
Q

For X~Geo(p)
P(X ≤ r) =

A

1 − (1 − p)^r

37
Q

For X~Po(λ)
E(X) =

A

λ

38
Q

For X~Po(λ)
Var(X) =

A

λ

39
Q

An indicator of whether a Poisson distribution may be able to model a data set is if…

A

The sample mean and sample variance
of the data set are close to one another.

40
Q

State the conditions for a situation to be modelled using the Poisson distribution.

A
  • The events occur randomly,
    and are all independent of each
    other.
  • The events happen singly (i.e.
    “one at a time”)
  • The events happen (on average)
    at a constant rate (λ).
41
Q

If X and Y are two independent Poisson random
variables with means λ and μ respectively, then X + Y~

A

Po(λ + μ)

42
Q

A binomial distribution with parameters n and p can
be approximated by a Poisson distribution, with parameter λ = np, if…

A

n is large and p is small (and so the

event is rare).

43
Q

For a goodness of fit test, the degrees of freedom ν =

A

Final number of columns (after
combining any columns due to low
expected frequencies) −1 for each
estimated parameter (e.g. λ for
Poisson) −1 as the total frequency is

one restriction.

44
Q

If your test statistic lies within the left-hand critical region…

A
  • Perhaps the model was
    constructed to fit a set of data.
  • Perhaps some of the data has
    been omitted in order to
    produce a better fit.
  • Perhaps some of the data is not
    genuine.