Statistics Flashcards

1
Q

State three advantages of random sampling

A
  • Avoids suspected sources of bias
  • Only a random sample enables proper statistical inference about the population to be undertaken
  • because the probability basis on which the sample has been selected is known
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If the variance of X is v. What is the variance of X repeated twice and the results are added together?

A

2v
(Var(X1 + X2) = 2Var(X). Not to be confused with Var(2X) = 4Var(X))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Conditions for a Poisson distribution to be appropriate

A

Events occur randomly at a uniform average rate, and independently of each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an independent variable?

A

A variable that is not subject to random variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is random sampling needed for proper statistical inference?

A

Because then the probability basis on which the sample has been selected is known

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

State the distribution of the score from a fair six-sided dice

A

Uniformly distributed over the values { 1, 2, … , 6 }
include brackets, its a set!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to comment on goodness of fit on a regression line

A
  • comment on r^2 (square r if needed)
  • comment on how close points lie to straight line
    *…so fit is not very/fairly/very good indeed!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Conditions for a reliable estimate to be made from regression line

A
  • Interpolation
  • and strong linear correlation (seen by points lying close to regression line)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Advantages of larger sample sizes for tests using correlation coefficients

A
  1. as sample size increases, random variation in sample tends to decrease
  2. so the (pmcc/spearman’s rank) coefficient tends to get closer to population correlation coefficient
  3. so one can be more confident that the correlation is genuine, rather than simply the result of random variation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between association and correlation?

A
  • Association refers to any relationship between two variables
  • Correlation refers to a linear relationship between two variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When is it appropriate to use the PMCC?

A
  • Data is random-on-random
  • Parent population follows a bivariate normal distribution (seen by grouping of points on scatter graph having a roughly elliptical shape)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sum of residuals ε1 + ε2 + … =

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

r^2

A
  • The coefficient of determination
  • The proportion of the variation is one variable that can be explained by the variation of another
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why might effect sizes be used instead of conducting a hypothesis test?

A
  • For a large set of random on random bivariate data a small non-zero value of the PMCC is likely to lead to a rejection of the null hypothesis of no correlation in the population; the test is uninformative
  • So the size of correlation is considered, rather than whether the population correlation is non-zero
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does a critical value of 5% mean?

A

5% of the time we reject the null hypothesis when it is in fact true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Situations when you might use Spearman’s rank instead of PMCC

A
  • If either variable is non-random
  • If the relationship in non-linear
  • Subjective data
  • Grouping of points on scatter diagram not roughly elliptical
16
Q

Null hypothesis for Spearman’s rank hypothesis test

A

H0: There is no association between x and y (in context) in the population

17
Q

4 conditions for a situation be modelled by a binomial distribution

A
  • Events are independent of each other
  • Events occur randomly at a constant probability
  • Only success/failure possible
  • Fixed number of trials
18
Q

Indicator of whether a Poisson distribution may be able to model a data set is if…

A

sample variance is reasonably close to sample mean

19
Q

If test statistic falls in LHS of the critical region (e.g. chi-squared)…

A
  • Perhaps the model was constructed to fit the data
  • Or some data has been removed in order to produce a better fit
  • Or some of the data is not genuine
20
Q

Four desirable features of a sample

A
  • Random
  • Unbiased
  • Representative of whole population
  • Items are chosen independently
21
Q

Why it would not be sensible to predict the distance for 5 year olds?

A
  • This would be extrapolation
  • As the least age is 50yo (put into context)
  • And the relationship may be different for 5 year olds
22
Q

Why is it sometimes not useful to plot to find the x on y equation?

A

If the values of x are non-random, then it makes no sense to try and predict them

23
Q

Assumption for a chi-squared test

A

Sample must be random

24
Q

Cohen’s guideline for effect sizes

A

ignored 0.0-0.1
small 0.1-0.3
medium 0.3-0.5
large 0.5-1.0

25
Q

Comment on outcome of hypothesis test considering the effect size of 0.165

A
  • The test shows that there is almost certainly some real correlation in the population
  • However, the test is uninformative since the effect size is so small
26
Q

Suggest a reason for not using an outlier in any analysis

A

Because it is not representative

27
Q

Explain why a census would not be used

A

Because it would be very expensive / impracticable to carry out

28
Q

Explain why they have decided to carry out a test based on PMCC

A
  • Grouping of points in scatter diagram is roughly elliptical
  • So there is evidence to suggest bivariate Normality in
    the population which is required for test using pmcc to be valid
29
Q

Disadvantage of using 10% significance level over 5% significance level

A

Null hypothesis is more likely to be wrongly rejected

30
Q

When is spearman’s rank test not appropriate?

A

If the scatter diagram shows no evidence of a monotonic relationship

31
Q

Disadvantage of spearman’s rank

A

Ranking data loses information, which might affect the outcome of a test

32
Q

Disadvantage of spearman’s rank

A

Ranking data loses information, which might affect the outcome of a test

33
Q

3 conditions for a situation be modelled by a geometric distribution

A
  • Events are independent of each other
  • Events occur randomly at a constant probability
  • Only success/failure possible