Exam 1 Flashcards

1
Q

Why use models?

A

To understand the relationships between variable

To predict future outcomes

To quantify differences between groups or treatments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Response variable

A

the variable that you want to understand/model/predict. aka - y, dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

explanatory variables

A

the variables you know and think that they are maybe related to the response variable that you want to use to figure out a pattern/model/relationship. aka - x, independent variable, predictor variable, covariates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

model

A

a function that combines explanatory variables mathematically into estimates of the response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

error

A

what’s left over; the variability in the response that your model doesn’t capture (error
is somewhat of a misnomer – maybe noise is a better term)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Categorical Data

A

Two outcomes, not numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Quantitative variables

A

Numerical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Parameter

A

Describes entire population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Statistic

A

Describes sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The four-step process

A
  1. Choose
  2. Fit
  3. Assess
  4. Use
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Model Notation

A

Y = f(X) + e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

ybar or xbar

A

averages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

yhat

A

estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Y = ? (Simple Linear Regression)

A

Beta0 + Beta1*X + e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Yhat = ? (Simple Linear Regression)

A

Beta0 + Beta1*X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Naive Model

A

Mean + Error

Age = Agebar + e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Residuals

A

How far from the prediction line points are

yhat - y

18
Q

Least Squares

A

Technique to minimize SSE
The value of all squared residuals is at a minimum

19
Q

SSE

A

SSE =∑(yhat − y)^2

20
Q

Regression Standard Error

A

σ = sqrt(SSE / n-2)

21
Q

Linearity

A

If the resuduals resemble a line

22
Q

Independence

A

Residuals do not depend on time. Don’t get bigger or smaller as plot goes on

23
Q

Normality of Residuals:

A

The residuals are distributed symmetrically around zero, with no skewness or kurtosis.

24
Q
  • Equal Variance of Residuals (homoskedasticity):
A

Variables have equal variance over time.

25
Q

Standard Error

A

ei / σhat = yi - yhati/σhat

If greater than 3 it is considered an outliar

26
Q

Leverage

A

Points that have extreme x values can have a disproportionate influence on the slope of the regression line

27
Q

Hypothesis Testing

A

H0: B1 = 0
HA: B1 DNE 0

28
Q

Test Statistic

A

t = B1hat / SE

29
Q

Confidence Interval for Slope

A

Beta1 +/- t* SE

30
Q

Coefficient of determination

A

R^2, How much of the variability is explained by the model

31
Q

Partitioning variability

A

ANOVA
(yi - ybar) = (yhat - ybar) + yi - yhat)

32
Q

SST

A

∑(yi - ybar)^2

33
Q

SSM

A

∑(yhat-ybar)^2

34
Q

SST, SSM, SSE Relationship

A

SST = SSM + SSE

35
Q

R^2 =

A

SSM/SST

36
Q

Confidence Interval

A

sqrt(1/n + [x*-xbar]^2/[∑x-xbar^2])

37
Q

Prediction Interval

A

sqrt(1 + 1/n + [x* -xbar]^2/[∑x-xbar^2])

38
Q

MLR

A

Y = B0 + B1X1 + B2X2 +…+Bp*Xp + e

39
Q

MLR with categorical data

A

Parallel slopes model

40
Q

When does p-value explain

A

p-value < .05