Lecture 2: Statistics and Data Analysis Flashcards

1
Q

Overview:

  1. Measures of Central Tendency and Dispersion
  2. Testing for Difference between MEANS = 3
  3. CORRELATION = 3
A
  • Measures of central tendency and dispersion
  • Testing for difference between means
    – Z-test
    – Student’s t-test
    – Paired t-test
  • Correlation
    – Correlation Coefficient r
    – Coefficient of Determination R2
    – Pearson’s r-test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the Measures of Central Tendency? Used for What? NON-PARAMETRIC

A
  1. Lab reports often use CLASS DATA and that means stats

2.Need a measure of what is typical, a measure of
central tendency
– mean (arithmetic mean, AKA average
– median

  1. Mean = sum of everything/ total number of everything
  2. Median is the 50% –> 1/2 more or 1/2 less
  3. MEDIAN IS NON-PARAMETRIC, ONLY ORDER MATTERS NOT HOW BIG
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

MEAN vs MEDIAN - when to use?

A
  • For NORMAL distribution mean and median are the SAME
  • For NON-NORMAL distribution very DIFFERENT
  • Mean income in Australia ~$72,000 but median is only ~$48,000 (of tax payers)
  • For NORMAL DATA USE MEAN
    NON-NORMAL DATA USE MEDIAN

1 * Median is a non-parametric measure of what is typical

2 * Mean is a parametric measure of typical, use for normal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are MEASURES OF DISPERSION?

A

1 * Mean measures what is typical, what about the range/spread in the data

2 * STANDARD DEVIATION measures the average variability in the data

  • For 1,2,3,4,5 mean is 3 and the deviations are:
  • -2, -1, 0, 1, 2 that is each value minus the mean
  • Want to know the average deviation but some are negative so the mean of the deviations is always 0
  • For standard Deviation (SD, theta) square the deviations, average them and square root
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Range about the Mean = 4

A
  • For normal distribution SD is at the inflection points
  • ~2/3 of all cases are within 1 SD of the mean
  • ~95% are within 2 SD of the mean
  • ~99.5% are within 3 SD of the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

SUMMARY OF STATISTICS = Measures of Central Tendency and Dispersion (4)

A

1 * In this unit assume all the data is normally distributed

2 * Use mean and standard deviation

3 * Estimation of mean and SD from a sample has and error as only some of the population was measured.

  • If we tested mysterious drug X on only 6 people we might have by tested 6 good responders

4 * Mean and SD of a sample have an error range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Understanding STANDARD ERROR -5

A

1 * Standard Error of the mean (SEM) is the error measure of a sample mean.

2 * Average blood pressure of any 6 people will be different to the next 6 you test.

3 * Find the standard deviation of the means of many experiments gives the standard error.

4 * Mean of the repeated experiment has a 2/3 chance of being within one SEM of the last experiments mean

  • For normal data SEM = (SD/[N^1/2])

5 * Standard deviation measures variation from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

SEM VS SD (3)

A

1 * SEM for error bars on a mean. Shows reliability of the mean

2 * SD for range bars showing the spread of the data

3 * SD also has a standard error but not commonly used

4 * SEM gives an error range for the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is HYPOTHESIS TESTING? USED? MOST COMMON? 7

A

1 * Hypothesis testing is what stats is really about

2 * Used to prove a difference in variable data

3 * Most common test: are two means different.

4 * In most labs you will test class data for difference between means

5 * If two means are several SD apart then we have evidence to say they are different

6 * Mean of every experiment is a bit different but a 3 SD difference between means is only a 3 in 1000 change

7 * That is p<0.003 or 0.3%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hypothesis testing: The Z test = 3

A
  • The z-test LOOK FOR THE FORMULA
  • If z>2 then p<0.05, as 95% chance the means would be closer than 2 SD if the difference is just random
  • But z-test assumes
    – Normal distribution
    – Infinite sample size
    – SD is the same for both groups
    – No correlation between groups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Understanding Student’s t-TEST: assumptions? use? 8

A

1 * Assumption of infinite sample size (n) is obviously a problem

2 * Student solved the infinite n problem with his t-test

3 * Calculates “t” and corrects the “p” value for sample
size (use a computer)

4 * Called Student’s t-test after his publishing name

5 * AKA equal variance t-test, excel type 2 t-test

6 * Still assumes:
– the same SD
– no correlation (not paired) – normal distribution

7 * Student’s t-test used to test if two means are different, assumes normal distribution, not paired and equal SD

8 * Paired t-test used to test if the change is zero, assumes normal distribution, paired & correlated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Understanding the Paired T-test = 6

A

1 * Paired t-test is an extension of Student’s t-test

2 * Paired t-test assumes the groups are correlated

3 * Type 1 t-test in excel

4 * Consider students marks in exams vs assignments.

5 * Expect good students will do well in both
– Data is correlated
– Data is paired

6 * Pairing is often repeated measures on the same person/animal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Understanding CORRELATION: 9

A

1 * Pearson correlation coefficient (r) is a measure of correlation

2 * Does increasing x make y also increase (positive r)

3 * Inverse correlation has a -r

4 * t-test calculates t, look up t and sample size to get p, Excel just tells you p.

5 * Same for correlation, need to report p not just r.

6 * Excel will report p value for r using the “regression” function in the data analysis tool pack.

7 * Coefficient of Determination (R2) is how well the line of best fit, fits the data.

8 * Data with a lot of spread could have a significant r but poor fit (small R2)

9 * For a linear fit r=R but not for non-linear curve fits

10 * Pearson’s correlation coefficient r used to test for correlation

11 * Coefficient of determination R2 measure closeness of fit

12 * Commonly use significant as p<0.05, 5% chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Summary of this lecture = 10

A

1 * Median is a non-parametric measure of what is typical

2 * Mean is a parametric measure of typical, use for normal data

3 * Standard deviation measures variation from the mean

4 * 68% of events are within 1SD of the mean and 95% within 2SD

5 * SEM gives an error range for the mean

6 * Student’s t-test used to test if two means are different, assumes normal distribution, not paired and equal SD

7 * Paired t-test used to test if the change is zero, assumes normal distribution, paired & correlated.

8 * Pearson’s correlation coefficient r used to test for correlation

9 * Coefficient of determination R2 measure closeness of fit

10* Commonly use significant as p<0.05, 5% chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly