Lecture 2: Statistics and Data Analysis Flashcards

Question 1

Q

Overview:

Measures of Central Tendency and Dispersion
Testing for Difference between MEANS = 3
CORRELATION = 3

Answer

A

Measures of central tendency and dispersion
Testing for difference between means
– Z-test
– Student’s t-test
– Paired t-test
Correlation
– Correlation Coefficient r
– Coefficient of Determination R2
– Pearson’s r-test

Question 2

Q

What are the Measures of Central Tendency? Used for What? NON-PARAMETRIC

Answer

A

Lab reports often use CLASS DATA and that means stats

2.Need a measure of what is typical, a measure of
central tendency
– mean (arithmetic mean, AKA average
– median

Mean = sum of everything/ total number of everything
Median is the 50% –> 1/2 more or 1/2 less
MEDIAN IS NON-PARAMETRIC, ONLY ORDER MATTERS NOT HOW BIG

Question 3

Q

MEAN vs MEDIAN - when to use?

Answer

A

For NORMAL distribution mean and median are the SAME
For NON-NORMAL distribution very DIFFERENT
Mean income in Australia ~$72,000 but median is only ~$48,000 (of tax payers)
For NORMAL DATA USE MEAN
NON-NORMAL DATA USE MEDIAN

1 * Median is a non-parametric measure of what is typical

2 * Mean is a parametric measure of typical, use for normal data

Question 4

Q

What are MEASURES OF DISPERSION?

Answer

A

1 * Mean measures what is typical, what about the range/spread in the data

2 * STANDARD DEVIATION measures the average variability in the data

For 1,2,3,4,5 mean is 3 and the deviations are:
-2, -1, 0, 1, 2 that is each value minus the mean
Want to know the average deviation but some are negative so the mean of the deviations is always 0
For standard Deviation (SD, theta) square the deviations, average them and square root

Question 5

Q

Range about the Mean = 4

Answer

A

For normal distribution SD is at the inflection points
~2/3 of all cases are within 1 SD of the mean
~95% are within 2 SD of the mean
~99.5% are within 3 SD of the mean

Question 6

Q

SUMMARY OF STATISTICS = Measures of Central Tendency and Dispersion (4)

Answer

A

1 * In this unit assume all the data is normally distributed

2 * Use mean and standard deviation

3 * Estimation of mean and SD from a sample has and error as only some of the population was measured.

If we tested mysterious drug X on only 6 people we might have by tested 6 good responders

4 * Mean and SD of a sample have an error range

Question 7

Q

Understanding STANDARD ERROR -5

Answer

A

1 * Standard Error of the mean (SEM) is the error measure of a sample mean.

2 * Average blood pressure of any 6 people will be different to the next 6 you test.

3 * Find the standard deviation of the means of many experiments gives the standard error.

4 * Mean of the repeated experiment has a 2/3 chance of being within one SEM of the last experiments mean

For normal data SEM = (SD/[N^1/2])

5 * Standard deviation measures variation from the mean

Question 8

Q

SEM VS SD (3)

Answer

A

1 * SEM for error bars on a mean. Shows reliability of the mean

2 * SD for range bars showing the spread of the data

3 * SD also has a standard error but not commonly used

4 * SEM gives an error range for the mean

Question 9

Q

What is HYPOTHESIS TESTING? USED? MOST COMMON? 7

Answer

A

1 * Hypothesis testing is what stats is really about

2 * Used to prove a difference in variable data

3 * Most common test: are two means different.

4 * In most labs you will test class data for difference between means

5 * If two means are several SD apart then we have evidence to say they are different

6 * Mean of every experiment is a bit different but a 3 SD difference between means is only a 3 in 1000 change

7 * That is p<0.003 or 0.3%

Question 10

Q

Hypothesis testing: The Z test = 3

Answer

A

The z-test LOOK FOR THE FORMULA
If z>2 then p<0.05, as 95% chance the means would be closer than 2 SD if the difference is just random
But z-test assumes
– Normal distribution
– Infinite sample size
– SD is the same for both groups
– No correlation between groups

Question 11

Q

Understanding Student’s t-TEST: assumptions? use? 8

Answer

A

1 * Assumption of infinite sample size (n) is obviously a problem

2 * Student solved the infinite n problem with his t-test

3 * Calculates “t” and corrects the “p” value for sample
size (use a computer)

4 * Called Student’s t-test after his publishing name

5 * AKA equal variance t-test, excel type 2 t-test

6 * Still assumes:
– the same SD
– no correlation (not paired) – normal distribution

7 * Student’s t-test used to test if two means are different, assumes normal distribution, not paired and equal SD

8 * Paired t-test used to test if the change is zero, assumes normal distribution, paired & correlated.

Question 12

Q

Understanding the Paired T-test = 6

Answer

A

1 * Paired t-test is an extension of Student’s t-test

2 * Paired t-test assumes the groups are correlated

3 * Type 1 t-test in excel

4 * Consider students marks in exams vs assignments.

5 * Expect good students will do well in both
– Data is correlated
– Data is paired

6 * Pairing is often repeated measures on the same person/animal

Question 13

Q

Understanding CORRELATION: 9

Answer

A

1 * Pearson correlation coefficient (r) is a measure of correlation

2 * Does increasing x make y also increase (positive r)

3 * Inverse correlation has a -r

4 * t-test calculates t, look up t and sample size to get p, Excel just tells you p.

5 * Same for correlation, need to report p not just r.

6 * Excel will report p value for r using the “regression” function in the data analysis tool pack.

7 * Coefficient of Determination (R2) is how well the line of best fit, fits the data.

8 * Data with a lot of spread could have a significant r but poor fit (small R2)

9 * For a linear fit r=R but not for non-linear curve fits

10 * Pearson’s correlation coefficient r used to test for correlation

11 * Coefficient of determination R2 measure closeness of fit

12 * Commonly use significant as p<0.05, 5% chance

Question 14

Q

Summary of this lecture = 10

Answer

A

1 * Median is a non-parametric measure of what is typical

2 * Mean is a parametric measure of typical, use for normal data

3 * Standard deviation measures variation from the mean

4 * 68% of events are within 1SD of the mean and 95% within 2SD

5 * SEM gives an error range for the mean

6 * Student’s t-test used to test if two means are different, assumes normal distribution, not paired and equal SD

7 * Paired t-test used to test if the change is zero, assumes normal distribution, paired & correlated.

8 * Pearson’s correlation coefficient r used to test for correlation

9 * Coefficient of determination R2 measure closeness of fit

10* Commonly use significant as p<0.05, 5% chance

Lecture 2: Statistics and Data Analysis Flashcards

(14 cards)