statistics year 2 Flashcards

1
Q

how to prove a normal distribution can be used to model a variable

A
  • state data is continous
  • show majority data lies within the mean plus or minus three standard deviations
    when calculating normal distribution questions on calculator, for unkown upper/lower use plus or minus three standard deviations from the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

‘given’ formula

A

P(B|A) = P(A∩B)/ P(A)
P(A∩B)= P(A) x P(B|A)
this is given on the formula booklet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

standardising a score formula

A

(score-mean) /standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

probability of exact number using normal distribution

A

Zero. Straight line so wont have area
-mention continous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

independent events

A

Two events, A and B, are independent if P(A|B) = P(A) or if P(A∩B) = P(A) x P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

modelling with probability

A

To model real-life situations mathematically, you often have to make simplifying assumptions. You can analyse and improve your model by comparing predicted results with actual data, questioning any assumptions that have been made.
To test a binomial model you can use the mean and variance. For X- B(n,p), the mean (u) and variance σ2 are given by u=np and σ2= np(1-p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

continuous random variable (CRV), X

A
  • can take any one of an infinite number of values on a given interval. Instead of assigning probabilities to individual values of X, you assign probabilities to ranges of values of X and the probability distribution is represented by a curve or a sequence of curves called a probability density function
    P(a≤ X ≤b) = integral of f(x) between bounds b and a
    -The ≤ and < signs become interchangeable, so will lead to the same area under the curve
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Normal distribution

A
  • The normal probability density function has a bell-shaped curve. It is a continuous function so the area under the curve can be used to calculate probabilities
  • Total area under the curve=1
  • In the normal distribution:
  • mean=median=mode
  • distribution is symmetrical
  • points of inflection one standard deviation from the mean
  • roughly 68% of values lie within one standard deviation of the mean
  • roughly 99.8% of values lie within three standard deviations of the mean
  • If a variable X follows a normal distribution you write X ∼ N ( μ , σ 2 )
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

standard normal distribution

A

given as symbol Z
Z ∼ N ( 0, 1 )
z = (X – μ) / σ
- often need to use inverse normal to find Z and then rearrange to solve for mean or standard deviations as required

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

using normal distribution as approximation to the binomial

A
  • the binomial distribution models situations where a random variable takes only discrete values. The normal distribution models continuous variables. If n is large enough, usually when n is bigger than 30 and if p is roughly 0.5, you can use a normal distribution to approximate a binomial distribution.
  • As the number of trials for binomial distributions increase the shape of the distribution may become increasingly symmetric about its mean and increasingly resembles a Normal distribution
  • For X∼B(n,p) as n increases, the distribution of X tends to that of the random variable Y where Y∼N(np,np(1-p)) ONLY IF N IS LARGE AND P IS ROUGHLY 0.5
  • P(X=x) ≈ P(x-0.5< Y < x+0.5)
  • Inclusion of the 0.5 increases the accuracy of the approximation. It is known as a continuity correction. You should always use it when approximating a discrete distribution by a continuous distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

correlation hypothesis testing

A
  • testing for evidence of linear correlation. Test can only tell you type of correlation and not strength
  • null hypothesis is always p=0
  • alternative hypothesis can be p>0, p<0 or p is NOT 0 (this would be a two-tailed test)
    Reject null hypothesis if:
  • the p-value is less than the significance level
  • or if PMCC (r) of a sample falls in the critical region, so is closer to -1 or 1 than p is
  • use hypothesis test as it would be to difficult to work out the PMCC value for the whole population. Instead can be estimated from a sample taken from the whole population, and the PMCC value will be denoted as r
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

PMCC

A
  • Pearsons product moment correlation coefficient, r, is a statistic that estimates p
  • measure of correlation in a sample and is used to estimate the population correlation coefficient. Estimate becomes better as the sample size increases, but it is likely to differ from the true value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does it mean for a test to have a 5% significance level

A
  • 5% chance of rejecting the null hypothesis even though its actually true
How well did you know this?
1
Not at all
2
3
4
5
Perfectly