Lab #7-9: Statistics Flashcards
What is the chi-squared test?
A statistical test to test hypotheses about whether your data is as expected
–> The basic idea behind the test is to compare the observed values in your data to the expected values that you would see if the null to determine if a difference is due to chance or a relationship between variables
What are the “expected” values in a chi squared test?
They are the hypothetical test results that would be observed IF the null hypothesis were true
There is a directly relationship between chi-squared and the difference between _________ and ____________
The difference between the observed and expected values
–> the larger the difference, the greater the chi-squared value
What is the null hypothesis of chi-squared tests?
There is no association or relationship between two categorical variables
–> essentially, meaning that the variables are independent of each other.
1) There is no relationship between A and B
2) There is no preference exhibited for A over B
3)
How is chi-squared calculated?
The individual chi-squared for each data point is calculated using the following:
[(Observed - Expected)^2] / Expected
Once this is calculated for each data point, the overall chi-squared is achieved by adding all these calculated values together
Once you get the chi-squared value, what must be done next?
At this point you do not have all the info you need to be able to determine the p-value
–> Must calculate the degrees of freedom!
and THEN use both the chi-squared and df to determine p-value (using a table)
What are degrees of freedom?
How do we find df for chi-squared analysis?
Number of values in a data set that are free to vary
df = N - 1
–> = Number of CATEGORIES - 1
What do we use to determine p-value from a chi-squared test? (3 things)
1) Calculated overall chi-squared
2) Degrees of freedom
3) Chi-squared probability table
How do you use a chi-squared probability table?
1) Find the df that corresponds with your data set on the left column of the table
2) In that row for your df, slide over until you reach a column with your chi-squared value
NOTE: It is very common to NOT find your chi-squared value exactly on the table, if this happens find what two columns your value is between
3) From the column your chi-squared value is, slide down to the bottom of that column to find the p-value
NOTE: For chi-squared that is BETWEEN columns, the p-value will also be BETWEEN the p-values in both of those columns
What criteria MUST be met in order to utilize a chi-squared test?
1) Data must be categorical
2) There must be 2 or more categories
3) Observations must be INDEPENDENT (each value in data set represents different subject)
4) Chosen samples should be representative and randomly selected
5) Sample size should be GREATER THAN 5X the # of categories!!! (AFTER subtracting outliers)
(Ex: 5 categories –> Sample size should be > 25!)
–> Each category should be able to have greater than 5 subjects in it if the data were to be distributed evenly
How do you determine the “expected” value for chi-squared test?
Expected values will always be equal to the hypothetical “results” of the null hypothesis
Ex: If trying to see if there is a preference for cold temp over warm temp, the null would be that NO preference exists, therefore the # of subjects in the cold environ vs warm environ would be equal
–> So the expected value would be exactly half of the sample size
How are observational outliers handled in chi-squared tests?
Observational outliers are SUBTRACTED OUT from the total # of samples when determining the EXPECTED values
If you have an experiment where you begin with 22 isopods and 4 categories, the null hypothesis is that there will be no preference between the categories, and while running the experiment 6 isopods die, what statistical analysis is used?
22 isopods over 4 categories = 5.5 per category (if no preference shown)
HOWEVER, 6 isopods died = 22 (total) - 6 (outliers) = 16 remaining
16 over 4 categories = 4 per category (NOT ENOUGH FOR CHI-SQUARED)
–> Experiment must be re-run with a greater # of isopods
How do you calculate chi-squared for an experiment with multiple trials?
Calculate the overall chi-squared for each INDIVIDUAL TRIAL
and then SUM together the chi-squared from each trial to get the TOTAL CHI SQUARED for the experiment!
Repetition vs Replication
Repeating an experiment = utilizing the SAME sample population under the same experimental conditions
Replicating an experiment = utilizing a DIFFERENT sample population under the same experimental conditions
What is a correlation?
A statistical measure that expresses the extent to which two variables are linearly related (meaning they change together at a constant rate)
(Determines if there is a linear relationship between two quantitative data sets –> discrete or continuous)
What is a T-Test?
A statistical test that compares the means of two data sets to determine if there is a significant difference between them