Week 4 | THE CHI-SQUARE, t AND F DISTRIBUTIONS INFERENCES ABOUT ONE OR TWO POPULATION VARIANCES AND POPULATION PROPORTIONS Flashcards
in addition to the normal distribution, what other 3 frequently used continuous probability distributions are there?
Name the properties of these 3 distributions
1) Chi square distribution:
Z1, Z2,…, Zm are independent standar dnormal vairbales i.e. Zi : N(0;1) then random variable V defined as
V = sum ^m_(i=1) Z_i^2 ~X_m^2
it has the following properties:
i. E(V) = m, Var(V) = 2m
ii. X_i^2 = Z^2, Z:N(0,1)
iii. (From example) the density curve of V is skewed to the right, as df increases the chi-square distribution becomes more and more symmetric and df>100 it is similar to a standard normal distribution
2) student t distribution
If Z is a standard normal random variable, V is a chi-square random variable with df=m, and Z and V are distributed independently then:
t = ((\sqrt(m)*Z)/\sqrt(V)
has a student t-distribution with df = m degrees of freedom
this distribution has the following properties:
i. E(t)=0, Var(t) = m/(m-2) >1
ii. The density curve of t is bell shaped and symmetric around zero but it is flatter and wider than the Z-curve
iii. As m (df) approaches infinity, t distribution approaches the standard normal distribution
3) F distribution
If V1 and V2 are independent chi-square random variables with m1 and m2 degrees of freedom, respectively then:
i. The density curve of F is skewed to the right, but it is becoming symmetric as both m1 andm2 increase
ii. The expected value and variance of F exist when only m2>2 and m2>4
iii. If m1 = 1 and m2 = m then
F=(t_m)^2 -> if t~t_m then t^2 = F_(1,m)
iv. lim m_1F~(X_m)^2
What assumptions are the Chi-square test and the corresponding confidence interval for a population variance based on
i. The data is a random sample independent observations
ii. The variable of interest is quantitative and continuous
iii. The measurement scale is interval or ratio
iv. The sampled population is is normally distributed
- > check for normality before employing thee procedures
Consider a random sample dranw from a noral population, X: N(mu,delta).
The unbiased point estimator of mu^2 is the sample variance s^2
Let V denote a chi-square random varibale with df = n-1 and consider its (alpha/2) x 100% and (1-(alpha/2) x 100% percentiles (X_(1-alpha/2)^2 and X_(alpha/2)^2)
P(
P((n-1)s^2/X_(alpha/2)^2 < dleta^2 < (n-1)s^2/X_(1-\alpha/2)^2 ) = 1- alpha
(1-alpha) x 100% confidence interval is given by:
((n-1)s^2/X_(alpha/2)^2 , (n-1)s^2/X_((1-alpha)/2)^2 df = n-1
Correspondign t-statistic: H_0: delta^2 = delta_0^2 is
X^2 = ((n-1)s^2)/delta_0^2 ) ~ X_(n-1)^2
A production manager would like to know the standard deviation of the time
(X, hour) required to complete a certain task in a manufacturing plant. She
takes a random sample of 25 and performs preliminary data analyses that
produce the following results:
Histogram is approximately normally distributed
The noral Q-Q plot has points scattered everywhere
a) How would we construct the 90% confidence interval for delta
How would we use the code in R? (one sample chi sqaure test on variance)
df = n-1= 24, alpha = 0.05
X_(alpha/2,df)^2 = X_(0.025,24)^2 = 39.4, X_(1-alpha/2,df)^2 = 12.4
-> ((n-1)s^2/X_(alpha/2)^2 , (n-1)s^2/X_((1-alpha)/2)^2 df = n-1
= ((24 x 0.403^2)/39.4, (24x0.403^2)/12.4) = (0.099, 0.314)
with 95% confidence, delta is between \sqrt 0.099 = 0.315 and sqrt(0.314) = 0560 (hour)
VarTest(x) from Desc Tools package: One sample Chi square test on variance
A production manager would like to know the standard deviation of the time
(X, hour) required to complete a certain task in a manufacturing plant. She
takes a random sample of 25 and performs preliminary data analyses that
produce the following results:
Following on this question, can it be inferred that at the 5% significance level the standard deviation of the time required to complete the task exceeds 20 minutes (i.e. 0.333 h)?
Hypotheses:
H_0: delta = 0.333 and H_A : delta >0.333
-> H0: delta^2 = 0.333^2 = 0.111 vs HA: delta^2 >0.111
Test is right sided so reject H_0 if X_(obs)^2 > X_(df^2)
A production manager would like to know the standard deviation of the time
(X, hour) required to complete a certain task in a manufacturing plant. She
takes a random sample of 25 and performs preliminary data analyses that
produce the following results:
Following on this question, can it be inferred that at the 5% significance level the standard deviation of the time required to complete the task exceeds 20 minutes (i.e. 0.333 h)?
Hypotheses:
H_0: delta = 0.333 and H_A : delta >0.333
-> H0: delta^2 = 0.333^2 = 0.111 vs HA: delta^2 >0.111
Test is right sided so reject H_0 if X_(obs)^2 > X_(df^2)
A production manager would like to know the standard deviation of the time
(X, hour) required to complete a certain task in a manufacturing plant. She
takes a random sample of 25 and performs preliminary data analyses that
produce the following results:
Following on this question, can it be inferred that at the 5% significance level the standard deviation of the time required to complete the task exceeds 20 minutes (i.e. 0.333 h)?
Hypotheses:
H_0: delta = 0.333 and H_A : delta >0.333
-> H0: delta^2 = 0.333^2 = 0.111 vs HA: delta^2 >0.111
Test is right sided so reject H_0 if X_(obs)^2 > X_(df^2)