Lecture 2: Statistics and Data Analysis Flashcards
Overview:
- Measures of Central Tendency and Dispersion
- Testing for Difference between MEANS = 3
- CORRELATION = 3
- Measures of central tendency and dispersion
- Testing for difference between means
– Z-test
– Student’s t-test
– Paired t-test - Correlation
– Correlation Coefficient r
– Coefficient of Determination R2
– Pearson’s r-test
What are the Measures of Central Tendency? Used for What? NON-PARAMETRIC
- Lab reports often use CLASS DATA and that means stats
2.Need a measure of what is typical, a measure of
central tendency
– mean (arithmetic mean, AKA average
– median
- Mean = sum of everything/ total number of everything
- Median is the 50% –> 1/2 more or 1/2 less
- MEDIAN IS NON-PARAMETRIC, ONLY ORDER MATTERS NOT HOW BIG
MEAN vs MEDIAN - when to use?
- For NORMAL distribution mean and median are the SAME
- For NON-NORMAL distribution very DIFFERENT
- Mean income in Australia ~$72,000 but median is only ~$48,000 (of tax payers)
- For NORMAL DATA USE MEAN
NON-NORMAL DATA USE MEDIAN
1 * Median is a non-parametric measure of what is typical
2 * Mean is a parametric measure of typical, use for normal data
What are MEASURES OF DISPERSION?
1 * Mean measures what is typical, what about the range/spread in the data
2 * STANDARD DEVIATION measures the average variability in the data
- For 1,2,3,4,5 mean is 3 and the deviations are:
- -2, -1, 0, 1, 2 that is each value minus the mean
- Want to know the average deviation but some are negative so the mean of the deviations is always 0
- For standard Deviation (SD, theta) square the deviations, average them and square root
Range about the Mean = 4
- For normal distribution SD is at the inflection points
- ~2/3 of all cases are within 1 SD of the mean
- ~95% are within 2 SD of the mean
- ~99.5% are within 3 SD of the mean
SUMMARY OF STATISTICS = Measures of Central Tendency and Dispersion (4)
1 * In this unit assume all the data is normally distributed
2 * Use mean and standard deviation
3 * Estimation of mean and SD from a sample has and error as only some of the population was measured.
- If we tested mysterious drug X on only 6 people we might have by tested 6 good responders
4 * Mean and SD of a sample have an error range
Understanding STANDARD ERROR -5
1 * Standard Error of the mean (SEM) is the error measure of a sample mean.
2 * Average blood pressure of any 6 people will be different to the next 6 you test.
3 * Find the standard deviation of the means of many experiments gives the standard error.
4 * Mean of the repeated experiment has a 2/3 chance of being within one SEM of the last experiments mean
- For normal data SEM = (SD/[N^1/2])
5 * Standard deviation measures variation from the mean
SEM VS SD (3)
1 * SEM for error bars on a mean. Shows reliability of the mean
2 * SD for range bars showing the spread of the data
3 * SD also has a standard error but not commonly used
4 * SEM gives an error range for the mean
What is HYPOTHESIS TESTING? USED? MOST COMMON? 7
1 * Hypothesis testing is what stats is really about
2 * Used to prove a difference in variable data
3 * Most common test: are two means different.
4 * In most labs you will test class data for difference between means
5 * If two means are several SD apart then we have evidence to say they are different
6 * Mean of every experiment is a bit different but a 3 SD difference between means is only a 3 in 1000 change
7 * That is p<0.003 or 0.3%
Hypothesis testing: The Z test = 3
- The z-test LOOK FOR THE FORMULA
- If z>2 then p<0.05, as 95% chance the means would be closer than 2 SD if the difference is just random
- But z-test assumes
– Normal distribution
– Infinite sample size
– SD is the same for both groups
– No correlation between groups
Understanding Student’s t-TEST: assumptions? use? 8
1 * Assumption of infinite sample size (n) is obviously a problem
2 * Student solved the infinite n problem with his t-test
3 * Calculates “t” and corrects the “p” value for sample
size (use a computer)
4 * Called Student’s t-test after his publishing name
5 * AKA equal variance t-test, excel type 2 t-test
6 * Still assumes:
– the same SD
– no correlation (not paired) – normal distribution
7 * Student’s t-test used to test if two means are different, assumes normal distribution, not paired and equal SD
8 * Paired t-test used to test if the change is zero, assumes normal distribution, paired & correlated.
Understanding the Paired T-test = 6
1 * Paired t-test is an extension of Student’s t-test
2 * Paired t-test assumes the groups are correlated
3 * Type 1 t-test in excel
4 * Consider students marks in exams vs assignments.
5 * Expect good students will do well in both
– Data is correlated
– Data is paired
6 * Pairing is often repeated measures on the same person/animal
Understanding CORRELATION: 9
1 * Pearson correlation coefficient (r) is a measure of correlation
2 * Does increasing x make y also increase (positive r)
3 * Inverse correlation has a -r
4 * t-test calculates t, look up t and sample size to get p, Excel just tells you p.
5 * Same for correlation, need to report p not just r.
6 * Excel will report p value for r using the “regression” function in the data analysis tool pack.
7 * Coefficient of Determination (R2) is how well the line of best fit, fits the data.
8 * Data with a lot of spread could have a significant r but poor fit (small R2)
9 * For a linear fit r=R but not for non-linear curve fits
10 * Pearson’s correlation coefficient r used to test for correlation
11 * Coefficient of determination R2 measure closeness of fit
12 * Commonly use significant as p<0.05, 5% chance
Summary of this lecture = 10
1 * Median is a non-parametric measure of what is typical
2 * Mean is a parametric measure of typical, use for normal data
3 * Standard deviation measures variation from the mean
4 * 68% of events are within 1SD of the mean and 95% within 2SD
5 * SEM gives an error range for the mean
6 * Student’s t-test used to test if two means are different, assumes normal distribution, not paired and equal SD
7 * Paired t-test used to test if the change is zero, assumes normal distribution, paired & correlated.
8 * Pearson’s correlation coefficient r used to test for correlation
9 * Coefficient of determination R2 measure closeness of fit
10* Commonly use significant as p<0.05, 5% chance