Lectures 5-6: Central Limit, Confidence Intervals, Crosstabs, Chi-squared, Cramer's V, T & Z tests Flashcards
What types of tests are T and Z tests?
Hypothesis tests. The value specified in the null hypothesis is taken as the benchmark.
A T test allows us to test whether a sample mean (of a normally-distributed interval variable) significantly/reliably differs from a hypothesised value, or is just due to chance.
For what type of variables is the T test used?
Normally-distributed interval variables.
When should you use the T test as opposed to the Z test?
When you do not know the standard deviation of the population, or if the population size is small (under 30 observations).
What does “statistically significant” mean?
It means that the trends the the sample are representative of trends in the population. In other words, your result is unlikely to have happened by chance.
What does statistical significance depend on?
It depends on the association intensity (measured by Cramer’s V) and sample size.
The larger the sample, the better, however the strength of the association is the most important factor.
What are inferential statistics?
Statistics which describe our sample, but which also tell us what we can expect in new samples that we do NOT even have, allowing us to generalise our findings to a population. T tests are inferential statistics.
Descriptive statistics, by contrast, simply describe the data that you have.
In the context of the T test, what does the P value mean?
The P value gives the probability that the pattern of data in the sample could be produced by RANDOM data. It therefore gives the probability of rejecting the null hypothesis when it is in fact true (Type 1 error).
A P of .01 means that there is a 1% chance of getting the results with random data.
What is a Paired T test?
A Paired T test is a test to compare the mean of one group twice.
Eg. To test the balance of a group of people before and after drinking alcohol.
What is an Independent T test?
An Independent T test compares the means of two independent groups.
Eg. To measure the cholesterol level of a group which has taken a medication versus a group which has taken the placebo. The groups are different, but the variable (cholesterol) is the same.
What does Bivariate describes?
Bivariate describes relationships between two variables. Eg. education & Income.
In order to run your variables independence test, when should you use a crosstabulation ?
Crosstabulation is used when both variables are categorical.
Can you use a crosstabulation when the dependant variable is continuous and the independent variable is dichotomous?
No, crosstabulation is only for the case when both dependent and independent are categorical. Other tests comparing differences in means or differences in proportions are used when (D & I) variables are both not categorical.
What does the Central Limit Theorem state?
The higher the number of samples, the closer the distribution of the means of those samples will draw to a normal distribution.
In the Central Limit Theorem the mean of the sampling distribution is the population mean. True or False?
True. The mean of the sampling distribution is the population mean.
According to the Central Limit Theorem when is the sampling distribution approximately normal?
The sampling distribution is approximately normal if n is high (>30) or if the population distribution is normal.
In the Central Limit Theorem what’s the Standard Error?
The standard deviation of the sampling distribution indicates the range of possible error, also known as the Standard Error (SE).
The larger the sample size, the greater the Standard Error. True or False.
False. As the sample size increases, the Standard Error decreases.
What’s the relationship between the population standard deviation and the standard deviation of the sampling distribution (Standard Error)?
They are directly proportional.
How do you calculate the degrees of freedom in cross-tabulations?
one less than the number of rows, multiplied by one less than the number of columns
df = (r – 1)(c – 1)
What are cross-tabulations/ contingency tables?
Bivariate statistics for qualitative variables
Describe relationships between 2 variables (ex. Voter location and political choice)
Conventionally DV in columns and IV in rows.
What is the X2 Test (basically the same as chi-squared test)?
Uses a sample to make an inference about a population. Involves classification of a independent variable into 2 or more categories (nominal data).
For example, when looking at voting results, were there any significant differences between voter location (city, town, rural) and political choice? Assesses data in cross-tabulations/ contingency tables.
What is the X2 Test (basically the same as chi-squared test) testing?
What does a X2 value of 0 mean?
What does a high X2 value mean?
Test whether any difference between categories is statistically significant. Compare observed frequencies (fo) with those that would be expected if there were no relationship between the variables (fe).
If the observed and expected agree exactly, X2 = 0.
The greater the discrepancy between the observed and expected frequencies, the larger the X2 value (and thus the larger chance you will reject the null hypothesis).
What are criteria for which X2 approximates the chi-square distribution?
The sampling distributed X2 approximates the chi-square distribution very closely, provided there are more than 5 units per cell in the contingency table.
If too small, use “Collapse” or “Fisher’s Exact Test”.
What are the hypotheses in the chi-squared test?
Null hypothesis (H0): no difference in effect of IV categories on DV. For example, there is no difference between political choice with respect to voter location
Alternative hypothesis (Ha) is mutually exclusive with H0: there is a difference in effect of IV categories on DV. For example, people in cities are more likely to vote left-wing.