4.3 Hypothesis Tests - χ²-Test Flashcards

Question 1

Q

Motivation Behind the χ²-Test

Answer

A

-consider a test for attribute data
-assume we have n independent observations of a variate which can take K different values
-to build a model for these observations, we can use i.i.d random variables X1,…,Xn∈{1,..,K} with:
P(Xi=k) = pk
-for all i∈{1,..,n} and k∈{1,…,K}, and probabilities pk satisfy Σpk=1 where the sum is taken from k=1 to k=K
-since the observations are independent, the order does not matter and we only need to consider how often each class occurs, let:
Yk = |{i|Xi=k}| = Σ1{k} (Xi)
-where the sum is from i=1 to i=n for all k∈{1,..,K}
-if the model is correct, then Yk~B(n,pk) for all k∈{1,…,K} but since ΣYk=n (sum over k) the observations are not independent

Question 2

Q

Multinomial Distribution

Definition

Answer

A

-the joint distribution of (Y1,…,Yk) is called a multinomial distribution with parameters n and p1,…,pk

Question 3

Q

Normal Distribution as an Approximation to the Binomial

Overview

Answer

A

-for large n, we can approximate the distribution of Yk using a normal distribution

Question 4

Q

Normal Distribution as an Approximation to the Binomial

Proof

Answer

A

-use the fact that Yk is a sum of n independent random variables 1_{k}(Xi) and thus we can apply the central limit theorem
-the central limit theorem states that for any i.i.d sequence Zi, i∈ℕ or random variables with mean μ=E(Zi) and σ²=Var(Xi) we can conclude:
Y~N(np,np(1-p))
-approximately for large n

Question 5

Q

Normal Distribution as an Approximation to the Binomial

Summary

Answer

A

-for large n, we can approximate a B(n,p) distribution by a N(np,np(1-p))

Question 6

Q

Normal Distribution as an Approximation to the Binomial

Rule of Thumb

Answer

A

-the normal approximation for B(n,p) can be used if np≥5 and n(1-p)≥5

Question 7

Q

χ²-Test for Model Fit

Comparing Ho to the Observed Data

Answer

A

-assume that we have observed attribute data x1,…,xn∈{1,…,K} and we want to test the hypothesis Ho:P(Xi=k)=pk for all k∈{1,…,K}
-let:
yk = |{i|xi=k}| = Σ1_{k}(xi)
-be the sample count for class k∈{1,…,K}
-if Ho is true, we expect yk≈npk for all k
-so we can use:
c = Σ (yk-npk)²/n*pk
-sum over k=1 to K
-as a measure of how far away from Ho the data is

Question 8

Q

χ²-Test for Model Fit

Lemma

Answer

A

-assume Ho is true, let:
C = Σ (Yk-npk)²/npk
-sum from k=1 to k=K
-then C->χ²(K-1) as n->∞

Question 9

Q

χ²-Test for Model Fit

Lemma Proof for K=2

Answer

A

-we have:
Y1 + Y2 = n
-and
p1 + p2 = 1
-sub into the formula for C
-take the limit as n tends to infinity remembering to apply the normal approximation to the binomial

Question 10

Q

χ²-Test for Model Fit

Construct the Test for the Null Hypothesis

Answer

A

-if we write cn(α) for the (1-α)-quantile of the χ²(n)-distribution, then assuming Ho, we have:
P(C > c_{K-1}(α)) ≈ 1-α
-for large n, and thus we can reject Ho if the observed test statistic c satisfies c>c_K-1 (α)

Question 11

Q

χ²-Test for Model Fit

Summary

Answer

A

data: x1,…,xn∈{1,…,K}
model: X1,…,Xn∈{1,…,K} i.i.d with P(Xi=k)=pk for all i∈(1,…,n} ,k∈{1,…,K}
test: Ho:pk=πk for all k∈{1,…,K} vs H1:pk≠πk for one k∈{1,…,K}
test statistic: c=Σ (Yk-πk)²/nπk from k=1-K where yk=|{i|xi=k}|=Σ1_{k}(xi)
critical value: c_K-1(α), the (1-α)-quantile of the χ²(K-1)-distribution

Question 12

Q

χ²-Test for Model Fit

Rule of Thumb

Answer

A

-the χ²-test can be applied if n*πk≥5 for all k∈{1,…,K}

Question 13

Q

χ²-Test for Independence

Purpose

Answer

A

-tests whether two categorical variates are independent

Question 14

Q

χ²-Test for Model Fit

Number of Degrees of Freedom

Question 15

Q

χ²-Test for Independence

Description

Answer

A

a) create a table of the two variates
b) estimates the probability for each column
c) use these probabilities to estimate an expected outcome for each cell
d) compute the test statistic using these expected values and the observed values
e) find the critical value using the correct significance level, degrees of freedom = (no. of rows - 1)(no. of col.s - 1)
f) if the test statistic is less than the critical value, we can’t reject the null hypothesis that they are independent