block 9 + Flashcards
What is a “two way” table?
What’s another name for two way table?
When we want to examine the relationship between two categorical variables we tabulate one against the other.
Another name– cross tabulation.
How do you know if an association exists between 2 categorical variables?
if the distribution of one variable varies according to the value of the other.
What’s the formula for calculating the expected frequency in each cell?
(row total x column total)/ overall total
What does a large and small X2 say about the null hypothesis?
Large X2 suggests the data DON’T support the null hypothesis since the observed data are not what we would expect under the null hypothesis.
Small X2 suggests the data do support the null hypothesis since the observed data are close to what we would expect under the null hypothesis.
What’s the formula for degrees of freedom for a chi squared test?
v = (r- 1) x (c-1) r = number of rows in the table c = number of columns in the table
Knowing the test statistic, X2, and degrees of freedom allows us to know what information?
we can obtain the probability of obtaining the observed, or more extreme, data if the Null Hypothesis were true (a P-value).
What is the degrees of freedom for:
- chi squared test for independence
- chi squared test for trend
The chi-squared test of independence is designed to provide evidence against the null hypothesis of independence. That means that if there is a sufficiently large dependence of any kind between the two variables, the test will provide evidence against the null.
The chi-squared test for trend is designed to provide evidence against the null hypothesis of no linear trend. So the dependence between the variables needs to be in the form of a linear trend for the test to detect evidence against the null.
While the chi-squared test of independence can be used for a table with any number of rows (r) and columns (c), and has a (r-1)*(c-1) degrees of freedom, the chi-squared test for trend requires that one of the variables has 2 categories and one of the variables can be ordered. The chi-squared test for trend always has 1 degree of freedom because it is focused on one kind of dependence, a linear trend across ordered categories.
With what sample sizes can you use X2 (chi squared)?
For 2 x 2 Tables
If the total sample size N >40, then X2 can be used.
If N is between 20 and 40, and the smallest expected value is at least 5, X2 can be used.
r X c (r rows and c columns with r > 2 and c >2)
The X2 test is valid if no more than 20% of the expected values are less than 5 and none are less than 1.
What are the six steps necessary to determine whether there is an association between two categorical variables?
- display data as a 2 way table
- calculate row or column percentages as appropriate
- declare null hypothesis and calculate chi squared value
- calculate degrees of freedom
- refer to chi squared distribution to get p-value
- interpret p- value
What is correlation?
The method used to assess the degree to which two quantitative variables are associated with each other.
quantifies the strength of the association between two quantitative variables
What is linear regression?
describe the relationship between two variables and predict the value of one variable given the value of another variable.
When one of the variables is thought to depend on the other, it is more appropriate to quantify the relationship between them. (quantify = regression)
what is an effective and simple way of looking at two quantitative variables?
scatterplot
In a scatterplot, what’s usually plotted on x and y axis?
It is standard to plot:
The explanatory variable on the x-axis (horizontal axis)
The response variable on the y-axis (vertical axis)
Example:
to examine whether haemoglobin changes with age:
age = the explanatory variable haemoglobin = the response variable
what is a correlation coefficient?
Whats the standard way of calculating this?
measures the degree of linear association
The standard method is to calculate the Pearson’s correlation coefficient, denoted r.
Name 2 features of the correlation coefficient.
- measures the scatter of the points around an underlying linear (straight line trend)
- can take any value from -1 to +1