Chi-square and t-tests (wk 4) Flashcards
What is a chi-square test?
Chi-square test/ square is a test of difference among categorical (nominal/ordinal) variables. There are two types: goodness-of-fit and test of association (or test of independence).
Describe the chi-square goodness-of-fit test:
-Proportions with more than two levels
-How the proportions in data fit to fixed (expected) proportions. While the binomial test is limited to dichotomous variables (heads/ tails/ successes/ fail), chi-square tests can test more than two categories.
-Benford’s law -> The frequency of first digits of naturally occurring numerical data (prices, populations, lengths and etc) follow a particular proportion. Chi-square test for Benford’s law tests whether the frequencies of first-digits of the data follow the known proportion. If Benford’s law is preserved, the numbers are naturally occurring. If it is rejected, it’s likely that the data set is fabricated.
-Reporting test/ outcome -> The x2 value for df (degree-of-freedom) followed by p-value, normally, bigger x2 means bigger difference.
Describe the chi-square test of association and McNemar’s test:
-Comparing proportions across two or more groups (test of association)
-Test of association is how proportions of two data sets are associated.
-Checking association between two nominal/ordinal values e.g. whether the proportion of tories/labours differ depending on the region of the UK.
-Descriptive statistics for chi-square test of association can be summarised as a contingency table.
-Reporting test/ outcome -> Typically the test result is reported by Chi-square value with df and N (number of samples), followed by p-value.
-McNemar’s test -> Paired samples mean that data points are paired across two groups. McNemar’s test is only available for two dichotomous variables (i.e. 2-by-2 contingency table).
What is a t-test and the 3 types of t-test:
+T-test -> Difference in a group of measures (interval or ratio variables). Compare means of populations (there of more means we use a different test). Null hypothesis is that means are equal.
-Three types of t-test, each corresponds to the test for nominal/ordinal variables that we already learned:
1. One sample t-test ~ binomial or chi-square goodness of fit
2. Independent (unpaired) samples t-test ~ chi-square test of association
3. Paired samples t-test ~McNemar’s test
-For each t-test, you can decide whether to do a one-tailed or two-tailed test, just like the binomial test
What is a one-sample t-test:
-Compares the mean of one sample group against a fixed value
-No significant difference in score -> any difference is due to sampling error
-Significant difference in score -> any difference is not due to sampling error
What is a two-sample test:
-Comparing a measure across two groups -> independent
-Compares the observed difference between the means of two independent samples or categories. Because the data is from different groups, we say that it is independent.
What is a paired t-test?
-Comparing a measure across two groups -> paired
-Compares the main difference of one group measure on two occasions
What is the assumption of normality for t-tests?
-Normality -> Sampling distribution of the mean is normal – if you take groups of n-samples from the distribution and calculate the means of each sample group, those means are normally distributed. This holds when the sample size n is sufficiently large.
What are the stats tests assumptions for t-tests?
-Statistical tests based on the normality assumption are called parametric tests where normality should not always be assumed. The normality assumption (e.g. Shapiro-Wilk test) violation of the normality is indicated by low p-value.
What are the significances of differences in variances as reported for p-values?
-Significances of differences in variance are reported as p-value:
1. p <0.05 -> variance not equal
2. p > 0.05 -> variance are equal
+ If variances aren’t equal, the Welch t-test can be used
What is t-statistic?
-T-statistic -> T-tests are based on t-statistic. The variable t is similar to the z-score, but it is about the mean and SD of the sample, not the population. T value depends on the degree of freedom = sample size – number of groups. Normally, greater t-value means greater difference.