A2 Stats Flashcards
describe exponential modelling
use logs & coding to convert exponential relationship to linear relationship & use regression line
y = ax^n –> logy = nlogx + loga & plot logx against logy
y=kb^x
logy = xlogb + logk & plot x against logy
define PMCC
product moment correlation coefficient
numerical measure of the type & strength of linear correlation
what is the PMCC for a sample & population represented by?
for sample: r
for population: ρ
what values is r b/w?
-1 ≤ r ≤ 1
r = 1 linear & positive correlation
r = -1 linear & negative correlation
r = 0 no correlation
what does n/a mean in table?
data for that day is not available
remove the points from calculations
describe how to find PMCC using calculator
menu –> 6: statistics
2: y = a + bx
enter values for x & y
option –> 4: regression calc
PMCC is r value
comment on suitability of a linear regression or exponential model for given data
e.g. as r is very close to 1, there is strong positive correlation b/w ___________. therefore the data points lie close to a straight line so a linear regression model is suitable for __________ data
so exponential model is suitable for raw data
what constitutes a ‘strong correlation’?
generally over 0.6 or less than -0.6
describe how to find equation of the regression line from coded data
‘unlog’
compare with the equation y = mx + c
logb =
loga =
work out a & b
state the equation at the end
what does significance level mean?
the chance of incorrectly rejecting H0 when it is true
what must the conclusion of a hypothesis test include?
accept/reject H0
RELATE TO CONTEXT OF THE Q
what is PMCC hypothesis testing used for?
used to determine whether the PMCC for a sample, r, indicates that there is likely to be a linear relationship within the population
what are the null & alternative hypotheses for PMCC hypothesis testing?
the H0 is always that there is no correlation in the population ρ = 0
for positive correlation ρ > 0
for negative correlation ρ < 0
for any correlation ρ ≠ 0 (NB halve the significance level)
describe the method of a PMCC hypothesis test
- state H0: ρ = 0
& H1 - significance level =
- n =
(n is number of pairs of data) - find critical value using the PMCC table in data booklet (one +ve & one -ve for 2-tailed test)
- do number line & if r value is outside critical region (see OneNote) then accept H0, but if r value is inside the critical region, reject H0
- conclusion
e.g. as 0.1149 (r-value) < 0.5067 (critical value), it is not in the critical region so we accept H0. there is not sufficient evidence of a positive correlation b/w daily maximum gust & relative humidity (linking to Q)
for any set notation Q, what is the first step?
draw Venn Diagram
dot method for P(A’ u B’) or P(A’ u B’)
for u: add things with at least 1 dot
for n: add things with all the dots
describe conditional probability
the probability of an event can change depending on the outcome of a previous event
the probability that event B occurs, GIVEN that event A has already occured
what is the notation for conditional probability?
P(B|A)
the probability that event B occurs, GIVEN that event A has already occured
usually P(B|A)…
≠ P(A|B)
in conditional probability, if A & B are independent, what is the formula?
P(B|A) = P(B|A’) = P(B)
two-way table
see OneNote & notes in folder
write down marginal totals
what is the addition formula?
P(A u B) = P(A) + P(B) - P(A n B)
what is the multiplication formula?
P(B|A) = P(A n B) / P(A)
must divide by the probability of the 2nd letter
describe how conditional probabilities can be represented on a tree diagram
see Gordon OneNote
binomial distribution
a discrete probability distribution
normal distribution
a continuous probability distribution
describe the normal distribution graph
symmetrical about the mean (mean = median = mode)
infinite in both directions, the x-axis is an asymptote
area under the graph = 1
what percentage of values are within 1, 2 & 3 standard deviations of the mean?
68% of values are within 1 standard deviation of the mean
95% of values are within 2 standard deviations of the mean
99.7% of values are within 3 standard deviations of the mean
X~N(μ,σ^2)
X is normally distributed with population mean of μ
& population variance of σ^2 (σ = standard deviation)
describe using calculator for normal distribution
only ever use normal CD
the upper/lower limit you chose must be at least 5 standard deviation above or below the mean
questions combining normal distribution & binomial distribution
see Gordon OneNote
describe the inverse normal distribution
area means area to the left
what is the formula for coding the standardised normal distribution?
Z = x - μ / σ
x = raw score
μ = mean
σ = standard deviation
Z = standardised score
what are the parameters for standard normal distribution?
X~Z(0,1^2)
μ = 0
σ = 1
percentage points of the normal distribution table
p = probability to the right of the x value on the normal curve = greater than is +ve z value
when P(Z<z), make the z value from the table -ve
describe how to find z values on calculator
use inverse normal function
area = p (to the LEFT)
μ = 0
σ = 1
what are the conditions to be able to model the binomial distribution with the normal?
- if n(the sample size) is large (> 50)
- p(the chance of success) is close to 0.5
such that np > 10
must apply a continuity correction (using upper/lower bounds)
how do you find the parameters of the normal distribution is X~B(n,p) can be approximated as Y~N(μ,σ^2)
μ = np
σ = √np(1-p)
in formula book
when calculating probabilities using a normal approximation to a binomial distribution what must be applied?
continuity correction
for ≤ or ≥, use the bound that includes the integer - think number line, see OneNote
for < or >, use the bound that does not include the integer
in Q, what does ‘use a suitable approximation’ mean?
use normal to approximate binomial
in a Q, sometimes have to do binomial distribution twice
what is true if a (parent) population is normally distributed?
mean = μ
standard deviation = σ
any sample taken from the population is also normally distributed
what is the mean & standard deviation of a sample of a (parent) population?
mean = μ
standard deviation = σ / √n
where n is the sample size
describe how to do hypothesis testing for normal distribution
see Gordon OneNote
it is testing the mean
1. identify the test statistic (x̄)
2. state μ & σ of population & sample
3. state hypotheses H0 & H1
4. either:
- find probability of test statistic &
compare to significance level
- find critical value & see if test statistic
lies in critical region