7. Sept 7 Flashcards

1
Q

Quiz

A

Different between confidence interval and prediction interval?

  • The confidence int indicates the range over which the average value might occur, whereas the prediction interval estimates individual values
  • Averages vs individual data

Regression assumption violated
- Non-normality (skewed up)

Assumption violated
- Auto correlation (up and down

Assumption violated
- Homoscedasticity

Following describes making predictions about a response y outside the observed range of x values used in your analysis
- Extrapolation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Example in categorical

A

You want to know if males and females of particular species are of the same size.

  • Males turn out to be 1kg
  • Females turn out to be 0.8kg
  • – Those are averages, so there’s individual variation

Binomial X and normally distributed Y
- Normally done with a t-test

You can ALSO analyze this with a regression
- Yi = B0 + B1X + E-N(0,s)

How do you use that equation if your X variable is sex (male/female), a WORD?

  • Use “dummy coding”
  • Process of assigning 0’s and 1’s to categorical variables in order to convert them to math
  • AKA converting categorical variables to numbers using 0s and 1s

What if I wanna know the average size for females?
- Y = B0 + Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Example in categorical

A

You want to know if males and females of particular species are of the same size.

  • Males turn out to be 1kg
  • Females turn out to be 0.8kg
  • – Those are averages, so there’s individual variation

Binomial X and normally distributed Y
- Normally done with a t-test

You can ALSO analyze this with a regression
- Yi = B0 + B1X + E-N(0,s)

How do you use that equation if your X variable is sex (male/female), a WORD?

  • Use “dummy coding”
  • Process of assigning 0’s and 1’s to categorical variables in order to convert them to math
  • AKA converting categorical variables to numbers using 0s and 1s

What if I wanna know the average size for females?
- Y = B0 + Error

For males
y = B0 + B1X

What is the best estimate of the difference in size between the males and females?

  • B1 tells us the difference between groups?
  • – But wait, it’s the slope. How can a slope be helpful

B1 is both the slope AND the difference between groups

  • How can it be both?
  • Run is zero, rise is diff between the groups

EVEN THOUGH this is a t-test, it still works within the regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How this works in R

A

Creating the data in Excel

Have to create dummy-coded variable

  • 1 for all the males
  • 0 for the females

I get the same value whether I use the regression equation, or I use the averages of those groups

plot(Mass~Sex,data=datum)

  • gives us box and whiskers
  • Anytime you give R categorical X, it gives box and whiskers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

T Test in R

A

results=t.test(Mass~Sex, data=datum, var.equal=TRUE)

  • Weird thing: no summary function to t-test
  • So just call results
  • We can assume or not assume that the variance is homoscedastic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Test as LM

A

Exact same results as the t-test

results3=lm(Mass~Sex,data=datum)

We didn’t even have to make X 1

Why did it choose females as a reference group (instead of males)? Females come first alphabetically

There’s also a re-level function to call within lm() that lets you choose which is reference data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Key Point

A
  1. How to run a t-test in R (and that you can get the same results with lm())
  2. It works with this dummy-coding process
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Making more complicated

A

Categorical X with 3 categories instead of two
- Males, Females, Hermaphrodites

  • We need TWO new dummy-coded variables,
    Male gets 1 for male, 0 for hermaphrodite, 0 for female
    Female gets equivalent
    Hermaphrodite gets equivalent

Only need N-1 dummy-coded variables for N variables

Yi = B0 + B1Males + B2Hermaphrodite + E

  • B0 is the average y when all X’s are zero (in this case, size of females)
  • B1 here is difference between males and females (or more generally, the reference group)
  • B2 is difference between hermaphrodites and females
  • BUT it doesn’t give us the difference between males and hermaphrodites (we’d have to change the reference)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Making in excel

A

Column 1
Contains males, females, & hermaphrodites

Column 2
Mean mass for each

C3
Error

c4
Mean mass plus error

C5, 6, 7
dummy-coded error
For all 3 separately

C8
Mass calculated with dummy coding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to analyze this data

A

Can’t use t-test (only for 2 groups)
We run an ANOVA

results=aov(Mass~Sex,data=datum)

All this does is tell you that 2 groups are different from each other
- Doesn’t tell us which, how…

NOTE: You can get the ANOVA information from your summary function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Change the reference

A

We need to change the reference to compare males to hermaphrodites (in this example)

results=lm(Mass~Females + Herm, data=datum)

There is a coding way to change the reference
results=lm(Mass~Herm+Females+Males,data=datum)
- That doesn’t give a reference point. R automatically uses the last variable

Relevel function
results=lm(Mass~relevel(Sex, ref=”Males”),data=datum)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Main takeaways

A
  1. Ways to run t-test and ANOVAs in R
    - You should be able to do it easily
    - Understand how the dummy coding works (even though R does it for you)
    - – You REALLY need to know how it works

These t-tests are not technically valid. They are liable to make a type 1 error.
- Because we did TWO tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly