3. Aug 24th Flashcards

Question 1

Q

Can the null hypothesis be true?

Answer

A

Yes, in a controlled manipulative study.

Question 2

Q

What are multiple types of regression tests an example of?

Answer

A

The general linear model

- The most important topic of this entire class

Question 3

Q

What is the simplest form of the general linear model?

Answer

A

Simple linear regression

- Continuous x, continuous y

Question 4

Q

What are the two purposes of regression?

Answer

A

Fit a line to data
Test if the slope of that line is significant
- – If p-value < 0.05

Question 5

Q

What are the three possible causes of a large p-value?

Answer

A

If the p-value > 0.05, we don’t know if:
A) Sample size is too small
B) Effect size is too small
C) Too much noise

Question 6

Q

The traditional equation for a line

Answer

A

y = mx + b

b is the y intercept
m is the slope (change in y value/change in x value)

Question 7

Q

The stats equation for a line

Answer

A

General
Y = Beta-0 + Beta-1(X)
B0 is a constant
B1 is regression coefficient

Specific
y-hat = Beta-0 + Beta-1(X)
y-hat = predicted value of dependent variable

Question 8

Q

What two elements make up the relationship between every x and y (i.e. the line)?

Answer

A

Our equation PLUS Epsilon

E = error (normally distributed)

Question 9

Q

What two requirements must be met to be a “best fit line”?

Answer

A

Average error = 0
- – (The distance from each point to the line added up)/n = 0
The sum of squared error is minimized
- – Squaring to get rid of negatives
- – Complicated models are done iteratively

Question 10

Q

What is the more technical name for “total variance in y”?

Answer

A

Total Sum of Squares

Total Sums of Squares (SST): summed square of distance from data to null hypothesis line (0 slope)
Represents ALL the information we have
SST = SSR + SSE

Sigma(n top, i=1 bottom) (Yi - y-bar)^2
Yi - any individual Y (observed dependent variable)
Y-bar- average/mean y

Question 11

Q

What 2 things does the total sum of squares partition variation into?

Answer

A

https://365datascience.com/sum-squares/

SSR- sum of squares due to regression
— The variation in y due to variation in x
Sigma(n top, i=1 bottom) (Yi-hat - y-bar)^2
Yi-hat = Your predicted value of y
Y-bar = average/mean of Ys AKA mean of the dependent variable
SSE- sum of squares due to error
— Noise
Sigma(n top, i=1 bottom) (ei^2)
e = difference between the observed value and the predicted value

Question 12

Q

What type of p-value will you get if sum of square error (SSE) is larger than sum of squares regression (SSR)?

Answer

A

A large p value (greater than 0.05)

Question 13

Q

What type of p-value will you get if sum of squares regression (SSR) is greater than sum of squares error (SSE)?

Answer

A

A smaller p-value (smaller than 0.05)

Also means that movement in y is mostly due to x

Question 14

Q

Important take aways of regression

Answer

A

1) Best fit lines mean
- –a) Average error = 0
- –b) Minimize sum of squares error (SSE)

2) P-values are calculated by partioning variation in y into
- – Sum of squares regression (SSR)
- – Sum of squares error (SSE)

Question 15

Q

What does regression display?

Answer

A

Correlation

NOT causation

Question 16

Q

What DO you have to do to determine causation?

Answer

Study These Flashcards

A

A MANIPULATIVE experiment.

Observational studies will not prove causation.

This is the VERY reason that debates about global warming even exist.
— We don’t have 3 Earths to manipulate. We cannot do manipulative studies.

Question 17

Q

How can you test to be absolutely sure your R data results are correct?

Answer

Study These Flashcards

A

Make the data yourself.

Question 18

Q

What does the professor not believe in?

Answer

Study These Flashcards

A

“I don’t believe in randomness. Randomness is just other measurable things we haven’t measured.”

Question 19

Q

How to load data to R

Answer

Study These Flashcards

A

1) datum=read.csv(file.choose())

2) head(datum)

Question 20

Q

How to plot in R

Answer

Study These Flashcards

A

plot(Y~X,data=datum)

Question 21

Q

How to run almost any regression in R

Answer

Study These Flashcards

A

lm(same inside as plot)

Question 22

Q

What does he recommend you save your results as?

Answer

Study These Flashcards

A

results

results=lm(Biomass~Rainfall, data=datum)
summary(results)

Question 23

Q

What single function gives you most of the data you’re looking for in your results?

Answer

Study These Flashcards

A

summary()

Question 24

Q

What is R^2?

Answer

Study These Flashcards

A

The proportion of variation in y explained by x

If:
R^2 = 1, all points are on the regression line. Perfect fit.
R^2 = 0, no slope, no points on the line

R^2 always goes up when you add x-values

3. Aug 24th Flashcards

(24 cards)