7. Sept 7 Flashcards

Question 1

Q

Quiz

Answer

A

Different between confidence interval and prediction interval?

The confidence int indicates the range over which the average value might occur, whereas the prediction interval estimates individual values
Averages vs individual data

Regression assumption violated
- Non-normality (skewed up)

Assumption violated
- Auto correlation (up and down

Assumption violated
- Homoscedasticity

Following describes making predictions about a response y outside the observed range of x values used in your analysis
- Extrapolation

Question 2

Q

Example in categorical

Answer

A

You want to know if males and females of particular species are of the same size.

Males turn out to be 1kg
Females turn out to be 0.8kg
– Those are averages, so there’s individual variation

Binomial X and normally distributed Y
- Normally done with a t-test

You can ALSO analyze this with a regression
- Yi = B0 + B1X + E-N(0,s)

How do you use that equation if your X variable is sex (male/female), a WORD?

Use “dummy coding”
Process of assigning 0’s and 1’s to categorical variables in order to convert them to math
AKA converting categorical variables to numbers using 0s and 1s

What if I wanna know the average size for females?
- Y = B0 + Error

Question 3

Q

Example in categorical

Answer

A

You want to know if males and females of particular species are of the same size.

Males turn out to be 1kg
Females turn out to be 0.8kg
– Those are averages, so there’s individual variation

Binomial X and normally distributed Y
- Normally done with a t-test

You can ALSO analyze this with a regression
- Yi = B0 + B1X + E-N(0,s)

How do you use that equation if your X variable is sex (male/female), a WORD?

Use “dummy coding”
Process of assigning 0’s and 1’s to categorical variables in order to convert them to math
AKA converting categorical variables to numbers using 0s and 1s

What if I wanna know the average size for females?
- Y = B0 + Error

For males
y = B0 + B1X

What is the best estimate of the difference in size between the males and females?

B1 tells us the difference between groups?
– But wait, it’s the slope. How can a slope be helpful

B1 is both the slope AND the difference between groups

How can it be both?
Run is zero, rise is diff between the groups

EVEN THOUGH this is a t-test, it still works within the regression

Question 4

Q

How this works in R

Answer

A

Creating the data in Excel

Have to create dummy-coded variable

1 for all the males
0 for the females

I get the same value whether I use the regression equation, or I use the averages of those groups

plot(Mass~Sex,data=datum)

gives us box and whiskers
Anytime you give R categorical X, it gives box and whiskers

Question 5

Q

T Test in R

Answer

A

results=t.test(Mass~Sex, data=datum, var.equal=TRUE)

Weird thing: no summary function to t-test
So just call results
We can assume or not assume that the variance is homoscedastic

Question 6

Q

Test as LM

Answer

A

Exact same results as the t-test

results3=lm(Mass~Sex,data=datum)

We didn’t even have to make X 1

Why did it choose females as a reference group (instead of males)? Females come first alphabetically

There’s also a re-level function to call within lm() that lets you choose which is reference data

Question 7

Q

Key Point

Answer

A

How to run a t-test in R (and that you can get the same results with lm())
It works with this dummy-coding process

Question 8

Q

Making more complicated

Answer

A

Categorical X with 3 categories instead of two
- Males, Females, Hermaphrodites

We need TWO new dummy-coded variables,
Male gets 1 for male, 0 for hermaphrodite, 0 for female
Female gets equivalent
Hermaphrodite gets equivalent

Only need N-1 dummy-coded variables for N variables

Yi = B0 + B1Males + B2Hermaphrodite + E

B0 is the average y when all X’s are zero (in this case, size of females)
B1 here is difference between males and females (or more generally, the reference group)
B2 is difference between hermaphrodites and females
BUT it doesn’t give us the difference between males and hermaphrodites (we’d have to change the reference)

Question 9

Q

Making in excel

Answer

A

Column 1
Contains males, females, & hermaphrodites

Column 2
Mean mass for each

C3
Error

c4
Mean mass plus error

C5, 6, 7
dummy-coded error
For all 3 separately

C8
Mass calculated with dummy coding

Question 10

Q

How to analyze this data

Answer

A

Can’t use t-test (only for 2 groups)
We run an ANOVA

results=aov(Mass~Sex,data=datum)

All this does is tell you that 2 groups are different from each other
- Doesn’t tell us which, how…

NOTE: You can get the ANOVA information from your summary function

Question 11

Q

Change the reference

Answer

A

We need to change the reference to compare males to hermaphrodites (in this example)

results=lm(Mass~Females + Herm, data=datum)

There is a coding way to change the reference
results=lm(Mass~Herm+Females+Males,data=datum)
- That doesn’t give a reference point. R automatically uses the last variable

Relevel function
results=lm(Mass~relevel(Sex, ref=”Males”),data=datum)

Question 12

Q

Main takeaways

Answer

A

Ways to run t-test and ANOVAs in R
- You should be able to do it easily
- Understand how the dummy coding works (even though R does it for you)
- – You REALLY need to know how it works

These t-tests are not technically valid. They are liable to make a type 1 error.
- Because we did TWO tests