Probability Distributions: Chi Square Distribution Flashcards

1
Q

Chi Square (χ2) Distribution

A
  • Best method to test a population variance against a known or assumed value of the population variance.
  • Continuous distribution with degrees of freedom
  • Describe the distribution of a sum of squared random variables
  • Also used to test the goodness of fit of a distribution of data, whether series are independent, and for estimating confidences surrounding variance and standard deviation for a random variable from a normal distribution.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Chi Square Statistics

A
  • Chi square may be skewed to the right or with a long tail towards the large values of the distribution.
  • The overall shape of the distribution will depend on the number of degrees of freedom in a given problem.
    • The degrees of freedom are 1 less than the sample size.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Chi Square Properties

A
  • The mean of the distribution is equal to the number of degrees of freedom: μ=ϑ
  • The variance is equal to two times the number of degrees of freedom: σ2 = 2*ϑ
  • When the degrees of freedom are greater than or equal to 2, the maximum value for Y occurs when χ2=ϑ-2
  • As the degrees of freedom increases, the chi square curve approaches a normal distribution
  • As the degree of freedom increases, the symmetry of the graph also increases
  • Finally, it may be skewed to the right, and since the random variable on which it is based is squared, it has no negative values. As the degrees of freedom increases, the probability density function (pdf) appears symmetrical in shape
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Chi Square (χ2) Hypothesis Test

A
  • Usually the objective of the six sigma team is to find the variation of the output, not just the mean population.
  • Most importantly, the team would like to know how much variation the production process exhibits about the target to see what adjustments are needed to reach a defect-free process.
  • A comparison between several sample variances, or a comparison between frequency proportions, the standard test statistic called chi square χ2 test will be used.
  • The distribution of the chi square statistic is called the chi square distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Types of Chi Square Hypothesis Tests

A
  1. Chi-Square Test of Independence
  2. Chi Square Test of Variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Chi Square Test of Independence

A
  • Chi Square Test of Independence determines whether there is an association between two categorical variables (like gender, course selection)
  • For Example:
    • Chi Square Test of Independence examines the association between one category like gender (male and female) and the other category like percentages of absenteeism in school
  • Chi Square Test of Independence is a non-parametric test
    • In other words, the assumption of normality is not required to perform the test

Chi square test utilized a contingency table to analyze the date. Each row show the categories of one variable. Each column shows the categories of another variable. Each variable must have two or more categories. Each cell reflects the total number of cases for a specific pair of categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Assumptions of Chi-Square Test of Independence

A
  • Variable must be nominal or categorical
  • Category of variables are mutually exclusive
  • The sampling method to be a simple random sampling
  • The data in the contingency table are frequencies or count
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Contingency Tables

A
  • 2-way classification table containing frequencies of how often things appear and can be used to determine if 2 variables are independent or are significantly associated.
  • Since the actual measured may not agree with the theoretical values predicted you can use the Chi Square calculation to make the determination
  • Additionally, a correlation coefficient can be calculated.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Steps to perform Chi Square Test of Independence

A
  • Step 1: Define the null hypothesis and alternative hypothesis
    • Null hypothesis (H0): There is no association between the two categorical variables
    • Alternative Hypothesis (H1): There is a significant association between two categorical tables
  • Step 2: Specify the level of significance
  • Step 3: Compute χ2 statistic (See Attached)
  • Step 4: Calculate the degree of freedom= (numbers of rows -) (number of columns - 1) = (r-1) * (c-1)
  • Step 5: Find the critical value based on degrees of freedom
  • Step 6: Finally, draw the statistical conclusion: If the test statistic value is greater than the critical value, reject the null hypothesis, and hence we can conclude that there is a significant association between two categorical variables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Chi Square Test of Independence Example

Part 1

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Chi Square Test of Independence Example

Part 2

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Chi Square Test of Independence Example

Part 3

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Chi Square Test - Comparing Variances

Part 1

A
  • The chi square test is best option for two applications:
    • Case I: Comparing variances when the variance of the population known
    • Case II: Comparing observed and expected frequencies of test outcomes when there is no defined population variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Chi Square Test - Comparing Variances

Part 2

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Chi Square Test - Comparing Variances

Part 3

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Left-Tailed Chi Square Test Example

A
  • The average standard deviation of an airline’s passengers waiting time for a single queue is 16 minutes. Accordingly, the population variance is 256 (square of the standard deviation). The average standard deviation of the waiting time for separate queues of the pilot project with 7 passengers is 8 minutes. Thus, the sample variance is 64 (square of the standard deviation). Check whether the wait time reduction with 95% confidence level?
    • The null hypothesis is H0: σ12 ≥ (16)2
    • The alternative hypothesis is H1: σ12 < (16)2
      • Let’s look at the chi square table. Because S is less than σ, this is left tail test, so, df =7-1=6. The critical value for 95% confidence is 1.63
        • See Attached
      • The test statistic (1.5) is less than the critical value (1.63) and it is in the rejection region.
        • Hence, the null hypothesis must be rejected. The wait time decreased with the separate line.
17
Q

Right-Tailed Chi Square Test Example

A
  • Smartwatch manufacturer received customer complaints about the XYZ model, whose battery lasts a shorter time than the previous model. The variance of battery life of the previous model is 49 hours. 11 watches were tested, and the battery life standard deviation was 9 hours. Assuming that the data are normal distributed, could the claim about increased variation in the new model be validated with 5% significance level?
    • Population standard deviation σ12= 49 hours σ1 = 7
    • Sample standard deviation = 9hours
    • The null hypothesis is H0: σ12 ≤ (7)2
    • The alternative hypothesis is H1: σ12 > (7)2
      • Let’s look at the chi square table. Because S is greater than σ, this is a right tail test, so, df =11-1=10. The critical value for 95% confidence is 18.307.
      • See Attached
    • Test statistics is less than the critical value and it is not in rejection region. Hence we failed to reject the null hypothesis. There is no sufficient evidence to claim the battery life of new model show more variability.
18
Q

Two-Tailed Chi Square Test Example

A
  • Company HR believe that the variation in the salaries of new digital technology is no the same as the java technology. From historical data, the standard deviation of salaries of the java is $49k. Salaries of 30 new digital technology employees were collected, and its standard deviation is $70k. Assumming that the data are normally distributed, could the HR claim be validated with 95% confidence?
    • Population standard deviation σ1 = 49
    • Sample standard deviation = 70
    • The null hypothesis is H0: σ12 =(49)2
    • The alternative hypothesis is H1: σ12 ≠ (49)2
    • df =30-1=29.
    • Since s is not equal to σ, it is two tail test. So α/2 =0.05/2 = 0.025
    • For 29 degrees of freedom left tail (1-α/2 = 1-0.025 = 0.975) is 16.047
    • And right tail α/2=0.025 is 45.722
      • See Attached
    • Test statistics is more than 45.722 and is is the rejection region. Hence, we can reject the null hypothesis.
19
Q

Chi Square Sample Size

A