Statistics Flashcards

Question 1

Q

State three advantages of random sampling

Answer

A

Avoids suspected sources of bias
Only a random sample enables proper statistical inference about the population to be undertaken
because the probability basis on which the sample has been selected is known

Question 2

Q

If the variance of X is v. What is the variance of X repeated twice and the results are added together?

Answer

A

2v
(Var(X1 + X2) = 2Var(X). Not to be confused with Var(2X) = 4Var(X))

Question 3

Q

Conditions for a Poisson distribution to be appropriate

Answer

A

Events occur randomly at a uniform average rate, and independently of each other

Question 4

Q

What is an independent variable?

Answer

A

A variable that is not subject to random variation

Question 5

Q

Why is random sampling needed for proper statistical inference?

Answer

A

Because then the probability basis on which the sample has been selected is known

Question 6

Q

State the distribution of the score from a fair six-sided dice

Answer

A

Uniformly distributed over the values { 1, 2, … , 6 }
include brackets, its a set!

Question 7

Q

How to comment on goodness of fit on a regression line

Answer

A

comment on r^2 (square r if needed)
comment on how close points lie to straight line
*…so fit is not very/fairly/very good indeed!

Question 8

Q

Conditions for a reliable estimate to be made from regression line

Answer

A

Interpolation
and strong linear correlation (seen by points lying close to regression line)

Question 9

Q

Advantages of larger sample sizes for tests using correlation coefficients

Answer

A

as sample size increases, random variation in sample tends to decrease
so the (pmcc/spearman’s rank) coefficient tends to get closer to population correlation coefficient
so one can be more confident that the correlation is genuine, rather than simply the result of random variation

Question 10

Q

What is the difference between association and correlation?

Answer

A

Association refers to any relationship between two variables
Correlation refers to a linear relationship between two variables

Question 11

Q

When is it appropriate to use the PMCC?

Answer

A

Data is random-on-random
Parent population follows a bivariate normal distribution (seen by grouping of points on scatter graph having a roughly elliptical shape)

Question 12

Q

Sum of residuals ε1 + ε2 + … =

Question 13

Q

r^2

Answer

A

The coefficient of determination
The proportion of the variation is one variable that can be explained by the variation of another

Question 14

Q

Why might effect sizes be used instead of conducting a hypothesis test?

Answer

A

For a large set of random on random bivariate data a small non-zero value of the PMCC is likely to lead to a rejection of the null hypothesis of no correlation in the population; the test is uninformative
So the size of correlation is considered, rather than whether the population correlation is non-zero

Question 15

Q

What does a critical value of 5% mean?

Answer

A

5% of the time we reject the null hypothesis when it is in fact true

Question 16

Q

Situations when you might use Spearman’s rank instead of PMCC

Answer

A

If either variable is non-random
If the relationship in non-linear
Subjective data
Grouping of points on scatter diagram not roughly elliptical

Question 17

Q

Null hypothesis for Spearman’s rank hypothesis test

Answer

A

H0: There is no association between x and y (in context) in the population

Question 18

Q

4 conditions for a situation be modelled by a binomial distribution

Answer

A

Events are independent of each other
Events occur randomly at a constant probability
Only success/failure possible
Fixed number of trials

Question 19

Q

Indicator of whether a Poisson distribution may be able to model a data set is if…

Answer

A

sample variance is reasonably close to sample mean

Question 20

Q

If test statistic falls in LHS of the critical region (e.g. chi-squared)…

Answer

A

Perhaps the model was constructed to fit the data
Or some data has been removed in order to produce a better fit
Or some of the data is not genuine

Question 21

Q

Four desirable features of a sample

Answer

A

Random
Unbiased
Representative of whole population
Items are chosen independently

Question 22

Q

Why it would not be sensible to predict the distance for 5 year olds?

Answer

A

This would be extrapolation
As the least age is 50yo (put into context)
And the relationship may be different for 5 year olds

Question 23

Q

Why is it sometimes not useful to plot to find the x on y equation?

Answer

A

If the values of x are non-random, then it makes no sense to try and predict them

Question 24

Q

Assumption for a chi-squared test

Answer

A

Sample must be random

Question 25

Q

Cohen’s guideline for effect sizes

Answer

A

ignored 0.0-0.1
small 0.1-0.3
medium 0.3-0.5
large 0.5-1.0

Question 26

Q

Comment on outcome of hypothesis test considering the effect size of 0.165

Answer

A

The test shows that there is almost certainly some real correlation in the population
However, the test is uninformative since the effect size is so small

Question 27

Q

Suggest a reason for not using an outlier in any analysis

Answer

A

Because it is not representative

Question 28

Q

Explain why a census would not be used

Answer

A

Because it would be very expensive / impracticable to carry out

Question 29

Q

Explain why they have decided to carry out a test based on PMCC

Answer

A

Grouping of points in scatter diagram is roughly elliptical
So there is evidence to suggest bivariate Normality in
the population which is required for test using pmcc to be valid

Question 30

Q

Disadvantage of using 10% significance level over 5% significance level

Answer

A

Null hypothesis is more likely to be wrongly rejected

Question 31

Q

When is spearman’s rank test not appropriate?

Answer

A

If the scatter diagram shows no evidence of a monotonic relationship

Question 32

Q

Disadvantage of spearman’s rank

Answer

A

Ranking data loses information, which might affect the outcome of a test

Question 33

Q

Disadvantage of spearman’s rank

Answer

A

Ranking data loses information, which might affect the outcome of a test

Question 34

Q

3 conditions for a situation be modelled by a geometric distribution

Answer

A

Events are independent of each other
Events occur randomly at a constant probability
Only success/failure possible