Lab #7-9: Statistics Flashcards by Sarah Jennings

What is the chi-squared test?

A statistical test to test hypotheses about whether your data is as expected

–> The basic idea behind the test is to compare the observed values in your data to the expected values that you would see if the null to determine if a difference is due to chance or a relationship between variables

How well did you know this?

Not at all

Perfectly

What are the “expected” values in a chi squared test?

They are the hypothetical test results that would be observed IF the null hypothesis were true

How well did you know this?

Not at all

Perfectly

There is a directly relationship between chi-squared and the difference between _________ and ____________

The difference between the observed and expected values

–> the larger the difference, the greater the chi-squared value

How well did you know this?

Not at all

Perfectly

What is the null hypothesis of chi-squared tests?

There is no association or relationship between two categorical variables

–> essentially, meaning that the variables are independent of each other.
1) There is no relationship between A and B
2) There is no preference exhibited for A over B
3)

How well did you know this?

Not at all

Perfectly

How is chi-squared calculated?

The individual chi-squared for each data point is calculated using the following:

[(Observed - Expected)^2] / Expected

Once this is calculated for each data point, the overall chi-squared is achieved by adding all these calculated values together

How well did you know this?

Not at all

Perfectly

Once you get the chi-squared value, what must be done next?

At this point you do not have all the info you need to be able to determine the p-value

–> Must calculate the degrees of freedom!

and THEN use both the chi-squared and df to determine p-value (using a table)

How well did you know this?

Not at all

Perfectly

What are degrees of freedom?

How do we find df for chi-squared analysis?

Number of values in a data set that are free to vary

df = N - 1
–> = Number of CATEGORIES - 1

How well did you know this?

Not at all

Perfectly

What do we use to determine p-value from a chi-squared test? (3 things)

1) Calculated overall chi-squared
2) Degrees of freedom
3) Chi-squared probability table

How well did you know this?

Not at all

Perfectly

How do you use a chi-squared probability table?

1) Find the df that corresponds with your data set on the left column of the table

2) In that row for your df, slide over until you reach a column with your chi-squared value

NOTE: It is very common to NOT find your chi-squared value exactly on the table, if this happens find what two columns your value is between

3) From the column your chi-squared value is, slide down to the bottom of that column to find the p-value

NOTE: For chi-squared that is BETWEEN columns, the p-value will also be BETWEEN the p-values in both of those columns

How well did you know this?

Not at all

Perfectly

What criteria MUST be met in order to utilize a chi-squared test?

1) Data must be categorical

2) There must be 2 or more categories

3) Observations must be INDEPENDENT (each value in data set represents different subject)

4) Chosen samples should be representative and randomly selected

5) Sample size should be GREATER THAN 5X the # of categories!!! (AFTER subtracting outliers)
(Ex: 5 categories –> Sample size should be > 25!)

–> Each category should be able to have greater than 5 subjects in it if the data were to be distributed evenly

How well did you know this?

Not at all

Perfectly

How do you determine the “expected” value for chi-squared test?

Expected values will always be equal to the hypothetical “results” of the null hypothesis

Ex: If trying to see if there is a preference for cold temp over warm temp, the null would be that NO preference exists, therefore the # of subjects in the cold environ vs warm environ would be equal
–> So the expected value would be exactly half of the sample size

How well did you know this?

Not at all

Perfectly

How are observational outliers handled in chi-squared tests?

Observational outliers are SUBTRACTED OUT from the total # of samples when determining the EXPECTED values

How well did you know this?

Not at all

Perfectly

If you have an experiment where you begin with 22 isopods and 4 categories, the null hypothesis is that there will be no preference between the categories, and while running the experiment 6 isopods die, what statistical analysis is used?

22 isopods over 4 categories = 5.5 per category (if no preference shown)

HOWEVER, 6 isopods died = 22 (total) - 6 (outliers) = 16 remaining

16 over 4 categories = 4 per category (NOT ENOUGH FOR CHI-SQUARED)

–> Experiment must be re-run with a greater # of isopods

How well did you know this?

Not at all

Perfectly

How do you calculate chi-squared for an experiment with multiple trials?

Calculate the overall chi-squared for each INDIVIDUAL TRIAL

and then SUM together the chi-squared from each trial to get the TOTAL CHI SQUARED for the experiment!

How well did you know this?

Not at all

Perfectly

Repetition vs Replication

Repeating an experiment = utilizing the SAME sample population under the same experimental conditions

Replicating an experiment = utilizing a DIFFERENT sample population under the same experimental conditions

How well did you know this?

Not at all

Perfectly

What is a correlation?

A statistical measure that expresses the extent to which two variables are linearly related (meaning they change together at a constant rate)

(Determines if there is a linear relationship between two quantitative data sets –> discrete or continuous)

How well did you know this?

Not at all

Perfectly

What is a T-Test?

A statistical test that compares the means of two data sets to determine if there is a significant difference between them

How well did you know this?

Not at all

Perfectly

What is the chi-squared critical value?

Study These Flashcards

It is the value of chi-squared (that matches the df of your data) that would be calculated if theoretical data represented exactly a 5% probability that the null hypothesis is true

–> can be found by finding row for df on table and column for p = 0.05 and then finding where they intersect (chi-squared value)

What if experimental chi-squared is < critical chi-squared?

Study These Flashcards

The null hypothesis is ACCEPTED

(p > 0.05)

What if the experimental chi-squared is > critical chi-squared?

Study These Flashcards

Null hypothesis is REJECTED

(p < 0.05)

What if the experimental chi-squared = critical chi-squared?

Study These Flashcards

Null hypothesis is REJECTED
(p = 0.05)

How is data displayed for a correlation? (plotted)

Study These Flashcards

Scatter Plot with line of best fit (linear regression model)

What is R value?

Study These Flashcards

Correlation Coefficient!

Tells us the strength of the strength and direction of the linear relationship between two variables

What is R^2 value?

Study These Flashcards

Coefficient of determination

–> Tells us how well a given TRENDLINE mathematically “fits” the data

What do the different values of R mean?

R = 0 --> Absolutely no correlation (data are exactly opposite each other in trend) R < 0.2 = No correlation 0.2 < R < 0.4 = Weak correlation 0.4 < R < 0.6 = Moderate correlation 0.6 < R < 0.8 = Strong correlation 0.8 < R < 1 = Very strong correlation R = 1 --> Perfectly correlated (both data sets have exact same trend)

Paired T-Test

Paired = Compares the means of two **related** groups --> The means being compared are of the SAME SUBJECTS under different conditions! each data point in one group is therefore **paired** with a corresponding data point in the other group (like before and after measurements on the same subject)

Unpaired T-Test

Unpaired = Compares the means of two **independent** groups --> The means being compared are of DIFFERENT SUBJECTS; comparing two completely separate groups of subjects --> the data points in each group are not related to each other (like comparing height of men and women)

What criteria MUST be met to use a T-Test?

1) Data type is continuous 2) Independent samples (each data point in a group represents a different subject, no repeating subject observation) 3) Sample population is representative and randomly selected 4) **GREATER THAN/EQUALT TO 5 data points PER GROUP** 5) Data is normally distributed (all data points fall within +/- 3 SDs of the mean)

What criteria must be met for a data set to be considered normally distributed?

Each data point should be less than or equal to +/- 3 standard deviations from the mean value --> Data should fall into a bell-shaped curve!

What should be done (t-test analysis) if data is found to be NOT normally distributed?

The data will need to be normalized!

What is data called when it is NOT normally distributed?

SKEWED

How do you determine if a data set is skewed? (process)

1) Run descriptive statistics on the data set 2) Find the min and max values, the standard deviation, and the mean of the data set 3) Check if the min and max are > +/- 3 SDs from the mean value a) if they are > +/- 3 SDs from the mean = SKEWED DATA b If they are =< +/- 3 SDs from the mean = Normally Distrib.

How is skewed data normalized?

Data is transformed by logarithmic transformation: New (normalized) data set values = log (10 + original value) --> The new log values are the normalized data set now!

Positively skewed data

**Mean is LESS than the median** (and therefore the mode) And the median (middle value) is LESS than the mode (most common value)

Negatively skewed data

**Mean is GREATER than the median** (and therefore the mode) And the median (middle value) is GREATER than the mode (most common value

When you run a t-test on excel, what values do you need from the produced data table?

You need 3 values: 1) T-STAT 2) P (T<=t) Two- Tail --> (p-value 3) t-Critical two-tail (4. and the df for when reporting results in paper)

What two values do we compare in T-Test to determine significance (if we don't know the exact p-value)?

t-stat and t-critical

What is the difference between t-stat and t-critical?

t-stat = The CALCULATED (experimental) t-test value t-critical = the **minimum** t-test value needed for significance to be "established"

t-stat > t-critical

Null hypothesis is REJECTED; there IS a significant difference (p < 0.05)

t-stat < t-critical

Null hypothesis is ACCEPTED; there is NOT a significant difference (p > 0.05)

How are t-test results report parenthetically in a paper?

(**p value**, t-Test (**df**) = **T STAT**)

When is a line graph used?

To determine if data follows a particular trend over **time**

What is a Type I error?

A "false positive" --> Null hypothesis is rejected when it actually should not have been

What is a Type II error?

A "false negative" --> Null hypothesis is accepted when it actually should have been rejected

Lab #7-9: Statistics Flashcards

(44 cards)