Khan Academy: Inference For Categorical Data Flashcards

1
Q

How is a Chi-Square distribution created?

A

Having a standard normal distribution (mean=0 , SD=1), a Chi-Square distribution is the sum of the square of random variables created from the standard normal distribution, so:
Chi-Square distribution=X1^2+ X2^2+…+Xn^2
Note: X are numbers
n-1= degree of freedom ( but if n=1, degree of freedom is 1)
Ref

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When is Chi-Square distribution used?

A

To essentially measure errors from an expected value.

When we have categorical data and we want to test Ho, the statistic we use is Chi-Square statistic of the Chi-Square distribution ( equivalent to mean of sampling distribution or sample proportion of sampling distribution)
Ref

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Example of Chi-Square distribution problem

A

Note: the errors are assumed to approximately have a chi-square distribution (errors= (Observed-Expected)2)

Note: the division by the expected values are for normalizing

Example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the conditions for Chi-Square goodness of fit inference?

A

1) Randomness; Random sampling

2) Large Counts: the EXPECTED NUMBER of each category of outcome is at least 5

3) Independence: if We’re not sampling with replacement, then the sample size must be < 10% of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How is the degree of freedom for a contingency table calculated?

A

Df=(M-1)(N-1)
M: number of rows
N: number of columns
Ref

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why does the instructor uses both sick and not sick rows to calculate Chi-Square statistic?
Video

A

Because we want to test if the herbs have an effect or not, so both positive (not sick) and negative (sick) outcomes should be considered and included in the Chi-Square statistic calculation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does homogeneity test do?

A

It tests to see how similar things are. In statistics, it means we are going to look at two groups to see whether the distribution of those groups for a certain variable are similar or not. For example, we want to see if left handed people and right handed people have the same preferences in different subject domains

Ref

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is the null hypothesis translated to numbers in a contingency table?

A

Null hypothesis says there’s no difference, no association so we use this rule that P(A|B)=P(A) when A,B are independent ( no association) and calculate the numbers in the contingency table for assuming Ho (Null hypothesis) being True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between using Chi-Square test for homogeneity and association?

A

Although the processes are the same, the difference is that
For homogeneity we have different groups and want to see if the variable distribution is the same among these groups
For association we have one group and different variables from that group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the steps of calculating Chi-Square statistic using a contingency table?

A

0) set significance level and check the inference conditions
1) calculate the expected value for each value in the table
2) calculate the Chi-Square statistic
3) calculate the degree of freedom
4) use a table or calculator to find the P-value to compare to significance level
Note: P-Value< significance level means that the probability of getting more extreme values than what we have in the sample, assuming Ho is true is so low that we can confidently say the result of the sample we have is not based on pure chance and we can reject the Null hypothesis (Ho)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

There are 3 inference procedures for categorical data, what are they and when is each of them used?

A

Chi-Square test for goodness of fit: used when we have a hypothesized distribution and we want to see how good the sample data fits that (e.g. number of people coming to our store each day of the week)
Chi-Square test for Independence:
used when we have one group and we are measuring multiple variables from that group (e.g. association between taking different types of herbs and getting/not getting sick)
Chi-Square test for homogeneity:
used when we have multiple groups and want to see if the variable distribution is the same among these groups (e.g. distribution of favourite subjects of left handed people vs right handed people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly