statistics year 2 Flashcards

Question 1

Q

how to prove a normal distribution can be used to model a variable

Answer

A

state data is continous
show majority data lies within the mean plus or minus three standard deviations
when calculating normal distribution questions on calculator, for unkown upper/lower use plus or minus three standard deviations from the mean

Question 2

Q

‘given’ formula

Answer

A

P(B|A) = P(A∩B)/ P(A)
P(A∩B)= P(A) x P(B|A)
this is given on the formula booklet

Question 3

Q

standardising a score formula

Answer

A

(score-mean) /standard deviation

Question 4

Q

probability of exact number using normal distribution

Answer

A

Zero. Straight line so wont have area
-mention continous

Question 5

Q

independent events

Answer

A

Two events, A and B, are independent if P(A|B) = P(A) or if P(A∩B) = P(A) x P(B)

Question 6

Q

modelling with probability

Answer

A

To model real-life situations mathematically, you often have to make simplifying assumptions. You can analyse and improve your model by comparing predicted results with actual data, questioning any assumptions that have been made.
To test a binomial model you can use the mean and variance. For X- B(n,p), the mean (u) and variance σ2 are given by u=np and σ2= np(1-p)

Question 7

Q

continuous random variable (CRV), X

Answer

A

can take any one of an infinite number of values on a given interval. Instead of assigning probabilities to individual values of X, you assign probabilities to ranges of values of X and the probability distribution is represented by a curve or a sequence of curves called a probability density function
P(a≤ X ≤b) = integral of f(x) between bounds b and a
-The ≤ and < signs become interchangeable, so will lead to the same area under the curve

Question 8

Q

Normal distribution

Answer

A

The normal probability density function has a bell-shaped curve. It is a continuous function so the area under the curve can be used to calculate probabilities
Total area under the curve=1
In the normal distribution:
mean=median=mode
distribution is symmetrical
points of inflection one standard deviation from the mean
roughly 68% of values lie within one standard deviation of the mean
roughly 99.8% of values lie within three standard deviations of the mean
If a variable X follows a normal distribution you write X ∼ N ( μ , σ 2 )

Question 9

Q

standard normal distribution

Answer

A

given as symbol Z
Z ∼ N ( 0, 1 )
z = (X – μ) / σ
- often need to use inverse normal to find Z and then rearrange to solve for mean or standard deviations as required

Question 10

Q

using normal distribution as approximation to the binomial

Answer

A

the binomial distribution models situations where a random variable takes only discrete values. The normal distribution models continuous variables. If n is large enough, usually when n is bigger than 30 and if p is roughly 0.5, you can use a normal distribution to approximate a binomial distribution.
As the number of trials for binomial distributions increase the shape of the distribution may become increasingly symmetric about its mean and increasingly resembles a Normal distribution
For X∼B(n,p) as n increases, the distribution of X tends to that of the random variable Y where Y∼N(np,np(1-p)) ONLY IF N IS LARGE AND P IS ROUGHLY 0.5
P(X=x) ≈ P(x-0.5< Y < x+0.5)
Inclusion of the 0.5 increases the accuracy of the approximation. It is known as a continuity correction. You should always use it when approximating a discrete distribution by a continuous distribution

Question 11

Q

correlation hypothesis testing

Answer

A

testing for evidence of linear correlation. Test can only tell you type of correlation and not strength
null hypothesis is always p=0
alternative hypothesis can be p>0, p<0 or p is NOT 0 (this would be a two-tailed test)
Reject null hypothesis if:
the p-value is less than the significance level
or if PMCC (r) of a sample falls in the critical region, so is closer to -1 or 1 than p is
use hypothesis test as it would be to difficult to work out the PMCC value for the whole population. Instead can be estimated from a sample taken from the whole population, and the PMCC value will be denoted as r

Question 12

Q

PMCC

Answer

A

Pearsons product moment correlation coefficient, r, is a statistic that estimates p
measure of correlation in a sample and is used to estimate the population correlation coefficient. Estimate becomes better as the sample size increases, but it is likely to differ from the true value

Question 13

Q

what does it mean for a test to have a 5% significance level

Answer

A

5% chance of rejecting the null hypothesis even though its actually true