Lecture 11- The Chi Square Distributions Flashcards

1
Q

Chi-square distributions

A

are used for several different types of tests including goodness-of-fit tests, tests in population variance; but our concern in this lecture is the use of Chi-sq tests to conduct tests of independence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Chi-square distributions explained

A

If we were able to classify the members of a population according to two attributes, the aim of tests of independence is to determine whether the attributes are independent of each other or have some bearing on each other.
*We make use of a concept we encountered in probability theory -Contingency Tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
A

is a tabular presentation of the results of a random sample that relates to two random variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Contingency tables-example

A

Consider the table of sample data on the next slide.
This data relates to A’ Level grades (otherwise called the performance) for a random sample of sixth form students drawn from four of the top secondary schools (PC, CIC, NGHS & BAHS) in Trinidad and Tobago.
Due to the manner in which the data is recorded, such a table is called a contingency table.

                PC      CIC    NGHS  BAHS  Total Grade A     40        25      15       10            90 Grade B     20       10       10        5             45 Grade C     40       15        5         5             65 Total          100      50      30       20           200

In our example, there are two random variables: (1) The A’ Level Grade of the student and (2)The Secondary School Attended by the student.
*The interesting question about this table is: was the performance influenced by the school attended?
*In other words, do we have evidence here to suggest that the row variable (performance) is influenced by the columns (school attended)?
*Put yet another way, are the rows independent of the columns?
*This question is the basis of the very famous 2 (Chi Square) test of statistics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Generic form of the hypotheses

A

In other words, this sample is to be used to test :
*The null hypothesisthat the row variable is independent of the column variable.
*The alternative hypothesisthat the row variable is dependent on the column variable.
Accordingly the related test is called a test of
independence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Test of the independence

A

The requirements of a Test of Independence are not different from earlier tests, namely:
–Null hypothesis
–Alternative hypothesis
–Significance level
–Test Statistic
–Critical Region
–Conclusion

the Null Hypothesis
*H0: Row Variable is independent of the Column Variable
The Alternative Hypothesis
*H1: Row Variable is dependent on the Column Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The Multiplication Law

A

The Multiplication Law: P(B and C) = P(B|C) x P(C)
*Do you remember how we defined an independent event?
*Two events are said to be independentif the occurrence of one does not affect the probability of the occurrence of the other, for example, P(B | C) = P( B)
*Therefore, the special case of the multiplication law of probability states that given any two independent events B and C from the same sample space,
P(B and C) = P(B) x P(C).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The Chi Square Distribution

A

Go to the Chi Square Distribution Tables (look now at Table 8, which is the third of your Statistical Tables)
*You can see the shape of the Chi-Square Distribution which is positively skewed
*The columns give you the “100 percentage points” –in other words, for a 5% level of significance, you look at the column labelled 0.05, and so on.
*The Chi Square Distribution has only one parameter, i.e. the degrees of freedom.
*The Chi Square test statistic possesses a Chi Square Distribution with ( r –1 ) x (c –1) degrees of freedom.
*The rows give “V degrees of freedom”. What does this mean?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The Chi Square Distribution

A

The next step of Hypothesis Testing is to find the Critical Region or the Rejection Region, against which we compare the Test Statistic
*Let us choose a 5% significance level
*In our example, the d.f. = 6
*For 6 d.f.
and a 5% significance
level, we have a value
of 12.592

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Limitations of the Chi Square Test

A

The test is limited to two variables.
*The contingency tables must be at least 2 rows and 2 columns.
*Too many cells with expected frequency less than 5 limit the accuracy of the decision arising from the test. Accordingly, the number of cells with expected frequency less than 5 must be limited to 20% of all cells; otherwise, the decision will be invalid.
*The quality of the decision is influenced by the quality of the data collection.
Note that we can accommodate the contingency table with 2 rows and 2 columns by applying a Yates Correction.
*Yates Correction involves subtracting 0.5 from the absolute difference between observed and expected, before squaring. The changes are negligible when dfincreases.
*Should the number of cells with expected frequency less than 5 exceed 20% of all cells, adjoining rows and columns must be merged and the test repeated on the amended contingency table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly