Correlation And Hypothesis Testing Flashcards
What type of correlation is the PMCC is close to 1?
Positive linear correlation
What type of correlation if the PMCC is close to -1?
Negative linear Negative linear
What does r mean (hypothesis testing)?
PMCC for a sample
What does p mean (hypothesis testing)?
PMCC for the whole population
“Test for a positive correlation”
Which tail do you use?
Positive (upper) one tailed
“Evidence for some correlation” “No evidence for some correlation”
What tailed is used?
Two tailed (half the significance level)
Hypothesis for negative (lower) one tail
HO: p = 0
H1: p < 0
(PMCC: If the value given is smaller than the negative value from the table, reject H0 so there’s enough evidence)
“Test for a negative correlation”
What tailed is used?
Negative (lower) one tail
Hypothesis for positive (upper) one tail
H0: p = 0
H1: p > 0
(PMCC: If the value given is larger than the value from the table, reject H0 so there’s enough evidence)
Hypothesis for some correlation two tail
H0: p = 0
H1: p ≠ 0
(PMCC if neg value from table < r > value from table, reject H0 so there’s enough evidence)
To find the critical value
Use PMCC table
If the value is within the critical region…
It’s significant meaning you reject H0 so there’s enough evidence to suggest there’s a neg correlation/pos correlation/some correlation/an increase/a decrease etc
Test statistic
Used to test the hypothesis. It could be the result of the experiment calculated from the exampple
Null hypothesis H0
Hypothesis you assume to be correct
Alternate hypothesis
Tells you about the parameter if your assumption is shown to be wrong
Hypothesis test
A statement made about the value of a population parameter. It uses a sample to determine whether to reject H0
Critical value
The first value to fall inside the critical region
Critical regions
A region of the probability distribution which, if the test statistic falls within, you reject the null
Acceptance region
The area in which we accept the null hypothesis
“Test for an increase/improvement in…”
Which tail is used?
Upper one tail
“Test for a decrease/an over-estimate….”
Which tail is used?
Lower one tail
“Test for a change in….”
Which tail is used?
Two tailed
PMCC on calculator (from a given table)
- Menu 6
- 2
3 .Type values - Optn
- 4
Critical value on calculator
- Menu 7
- Scroll down 1
- 2 (for testing whether a given variable is significant), 1 (for finding the critical region)
When to use binomial or cumulative probability?
Binomial P(X = 4)
Cumulative P(X<4) P(X≤4) etc
Binomial distribution/probability
- Menu 7
- 1
- 2
Cumulative probability
Use table
- Menu 7
- Scroll opt 1
- 2 (testing a variable)
Comment on the suitability of the binomial distribution model
The probability is lower/higher than the expected value which suggests the model is not accurate
Suggest one improvement for the distribution model
A non uniform distribution
Requirements of a binomial distribution
1: The number of observations n is fixed.
2: Each observation is independent.
3: Each observation represents one of two outcomes (“success” or “failure”
4) there is fixed probability
Requirements of a normal distribution
1) The mean, median and mode are exactly the same.
2) The distribution is symmetric about the mean—half the values fall below the mean and half above the mean.
3) The distribution can be described by two values: the mean and the standard deviation.
Finding mean from binomial distribution
np
Finding variance from binomial distribution
np(1-p)
If 1-p is negative the just use np
Later it was discovered that the local scout group visited the supermarket that afternoon to buy food for their camping trip.
(f) Comment on the validity of the model used to obtain the answer to part (e), giving a reason for your answer
The 20 customers are independent & the members of the scout group may invalidate this so binomial distribution would not be valid
When testing a value against a hypothesis to see if there’s change/improvement.
- decrease P(X<_8)
- change/increase P(X>_8)
P value for two tailed test
Times the probability by 2
PMCC measures…
how strong the correlation between two variables is.
Z for normal distribution
X-U/o
_
X ~ N
(u, (o/root)^2
Normal distributions
X~N (u, o^2)
Normal distribution significant figures
- table = 4 d.p
- calculator = 3 d.p
(State whether your using a table or calculator)
Standard normal distribution
Z~N (0.1)^2
Z=X-u/o
Normal to standard
X~B (50, 4^2)
P(X<53)
P(Z<53-50/4) = P (Z<0.75)
0(0.75)
The Central Limit Theorem
Can use mean full time and mean part time ~ Normal
State an assumption you’ve used (when using variance)
Variance of sample = variance of pop.
Text whether or not there is evidence that the PMCC is positive
Positive upper tail
Two condition under which the normal distribution may be used as an approximation to the binomial distribution
Number of trials is large and probability of success is close to
0.5
If differences in mean is greater than differences in standard deviation
Sizes of standard deviations are small compared with the difference in mean temperatures making it more likely that the difference in means is significant
Explain why it is reasonable to model the daily mean pressure for Beijing, during
May to August using a normal distribution.
It’s bell shaped
give a reason why we cannot say there is no chance of a hurricane in Beijing during May to August.
The tails of a Normal distribution are infinite.
When to use upper and lower bounds for distribution values?
Only when using np, np(1-p)
How to show that the distribution of T is not discrete uniform distribution?
Show that the probabilities of the outcome aren’t equal
y=ax^n
logy=loga+nlogx
Y=ab^x
Logy=loga+xlogb
State, giving a reason, whether or not the correlation coefficient is consistent with Tess’a suggestion
Since r is close to -1 it is consistent (ie has strong correlation)
The linear regression equation is w 10 755- 171 t. Give an interpretation of the gradient of this regression equation
As t increases, w decrease
Subjects have a negative correlation. Given that on a day the humidity was high, what would expect the No. hours of sunshine to be?
Lower than average
Explain why this normal distribution may not be good model for T?
The model suggests non-negligible profitability of T values < 0 which is impossible
Give an interpretation of the correlation
Analyse the correlation using the variables
When to use (np, square root np(1-p))
When asks for suitable approximation or normal approximation
What data should be used when asked about ‘typical’ or ‘average’?
Mean and median (location of the data)
What data to use when asked about how ‘spread out’ the data is?
- Calculate standard deviation, range & interquartile range
- describe variability of the data
Describe the shape of the data
- how many peaks or modes
- symmetric or asymmetric
- skew (is there a long tail to the left or right)
What type of distribution do we have a sample of data?
frequency distribution
What type of distribution do we have the entire population?
Probability distribution
Relationship between mean and median regarding symmetry
If it’s symmetric, we can expect the mean and median to be about the same
What is unimodal?
One peak
Examples of variables that are positively skewed
Waiting times
Household income
Examples of variables that are negatively skewed
Satisfaction measures
Retirement age
Examples of variables that are symmetric
Height
Weight
When is poisson distribution used?
- to describe rare events & discrete occurrences over an interval of time
- independent (in non overlapping intervals)
- the range is form 0 onwards
- constant expected no. occurence
Examples of when poisson distribution would be used
- no. random arrivals per some time interval (customers arrivals to a store on weekday mornings)
- queuing theory
- rare blood disease