Statistics Flashcards
State three advantages of random sampling
- Avoids suspected sources of bias
- Only a random sample enables proper statistical inference about the population to be undertaken
- because the probability basis on which the sample has been selected is known
If the variance of X is v. What is the variance of X repeated twice and the results are added together?
2v
(Var(X1 + X2) = 2Var(X). Not to be confused with Var(2X) = 4Var(X))
Conditions for a Poisson distribution to be appropriate
Events occur randomly at a uniform average rate, and independently of each other
What is an independent variable?
A variable that is not subject to random variation
Why is random sampling needed for proper statistical inference?
Because then the probability basis on which the sample has been selected is known
State the distribution of the score from a fair six-sided dice
Uniformly distributed over the values { 1, 2, … , 6 }
include brackets, its a set!
How to comment on goodness of fit on a regression line
- comment on r^2 (square r if needed)
- comment on how close points lie to straight line
*…so fit is not very/fairly/very good indeed!
Conditions for a reliable estimate to be made from regression line
- Interpolation
- and strong linear correlation (seen by points lying close to regression line)
Advantages of larger sample sizes for tests using correlation coefficients
- as sample size increases, random variation in sample tends to decrease
- so the (pmcc/spearman’s rank) coefficient tends to get closer to population correlation coefficient
- so one can be more confident that the correlation is genuine, rather than simply the result of random variation
What is the difference between association and correlation?
- Association refers to any relationship between two variables
- Correlation refers to a linear relationship between two variables
When is it appropriate to use the PMCC?
- Data is random-on-random
- Parent population follows a bivariate normal distribution (seen by grouping of points on scatter graph having a roughly elliptical shape)
Sum of residuals ε1 + ε2 + … =
0
r^2
- The coefficient of determination
- The proportion of the variation is one variable that can be explained by the variation of another
Why might effect sizes be used instead of conducting a hypothesis test?
- For a large set of random on random bivariate data a small non-zero value of the PMCC is likely to lead to a rejection of the null hypothesis of no correlation in the population; the test is uninformative
- So the size of correlation is considered, rather than whether the population correlation is non-zero
What does a critical value of 5% mean?
5% of the time we reject the null hypothesis when it is in fact true
Situations when you might use Spearman’s rank instead of PMCC
- If either variable is non-random
- If the relationship in non-linear
- Subjective data
- Grouping of points on scatter diagram not roughly elliptical
Null hypothesis for Spearman’s rank hypothesis test
H0: There is no association between x and y (in context) in the population
4 conditions for a situation be modelled by a binomial distribution
- Events are independent of each other
- Events occur randomly at a constant probability
- Only success/failure possible
- Fixed number of trials
Indicator of whether a Poisson distribution may be able to model a data set is if…
sample variance is reasonably close to sample mean
If test statistic falls in LHS of the critical region (e.g. chi-squared)…
- Perhaps the model was constructed to fit the data
- Or some data has been removed in order to produce a better fit
- Or some of the data is not genuine
Four desirable features of a sample
- Random
- Unbiased
- Representative of whole population
- Items are chosen independently
Why it would not be sensible to predict the distance for 5 year olds?
- This would be extrapolation
- As the least age is 50yo (put into context)
- And the relationship may be different for 5 year olds
Why is it sometimes not useful to plot to find the x on y equation?
If the values of x are non-random, then it makes no sense to try and predict them
Assumption for a chi-squared test
Sample must be random
Cohen’s guideline for effect sizes
ignored 0.0-0.1
small 0.1-0.3
medium 0.3-0.5
large 0.5-1.0
Comment on outcome of hypothesis test considering the effect size of 0.165
- The test shows that there is almost certainly some real correlation in the population
- However, the test is uninformative since the effect size is so small
Suggest a reason for not using an outlier in any analysis
Because it is not representative
Explain why a census would not be used
Because it would be very expensive / impracticable to carry out
Explain why they have decided to carry out a test based on PMCC
- Grouping of points in scatter diagram is roughly elliptical
- So there is evidence to suggest bivariate Normality in
the population which is required for test using pmcc to be valid
Disadvantage of using 10% significance level over 5% significance level
Null hypothesis is more likely to be wrongly rejected
When is spearman’s rank test not appropriate?
If the scatter diagram shows no evidence of a monotonic relationship
Disadvantage of spearman’s rank
Ranking data loses information, which might affect the outcome of a test
Disadvantage of spearman’s rank
Ranking data loses information, which might affect the outcome of a test
3 conditions for a situation be modelled by a geometric distribution
- Events are independent of each other
- Events occur randomly at a constant probability
- Only success/failure possible