Simple linear regression Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

what is linear regression?

A

simple approach to supervised learning. it is used to model the relationship between several input variables (x) and a continuous response variable (y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

assumed model?

A

y=β0+β1 X+e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

distance between observed and predicted values?

A

residual e= Yi-predicted Yi= Yi-(β0+β1Xi)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

residual sum squares (RSS)?

A

total magnitude of deviations from all squared residuals of data points (sum) (residual may be positive or negative thus square)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

to find β0 and β1, use estimation of least squares

A

first order derivatives of RSS w/ respect to β0 and β1 separately, set to 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

predicated β1?

A

cov(x,y)/var(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

predicted β0?

A

mean(y) - β1*mean(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is standard error(SE) an estimator for?

A

how the estimates vary under repeated sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

hypothesis testing for relationship between x and y?

A

H0: β1=0 H1: β1!=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

t-statistics(to test null hypothesis)

A

t=(β1-0)/SE(β1). n-2 degrees of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

critical value and confidence interval when n is large?

A

1.96(as n increases, t-dist gets closer to normal dist) and 95%(as n increases, t-dist gets closer to normal dist)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

p-value definition?

A

probability of observing any value >= |t|

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

calculate confidence interval

A

[β1+-1.96*SE(β1)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

when to reject null hypothesis?

A

when |t| both larger than 1.96, we can reject H0 with 95% confidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Residual standard error means?

A

RSE measures lack of fit, if RSE=3.259, on avg, deviation of Y from regression line is 3.259 points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

R squared for?

A

measures how well regression model describes data. e.g. if R squared is 0.6119, X explains only 61.19% of subject

17
Q

how to measure RSE

A

sqrt(1/(n-2)RSS)

18
Q

measure R square?

A

1-(RSS/TSS). TSS is the total variance in response variable y. (ranges from 0 to 1)

19
Q

for 95% CI, use β1 or β0

A

β1. β0 has nothing to do with r/s between X and Y

20
Q

how to install package MASS

A

intall.package(‘MASS’)

21
Q

load MASS?

A

library(MASS)

22
Q

load data Boston in MASS?

A

data(Boston)

23
Q

documentation in data set?

A

?Boston

24
Q

number of missing values?

A

sum(is.na(Boston))

25
Q

number of duplicated values?

A

sum(duplicated(Boston))

26
Q

find outliers for both variables?

A

boxplot. stats(Boston$var1)$out

boxplot. stats(Boston$var2)$out

27
Q

reduce dataset to subset of the 2 variables?

A

name=subset(Boston, select=c(var1,var2))

28
Q

scatterplot? var1 being y and var2 being x

A
plot(var1~var2, main='Scatterplot of var1 vs var2',
xlab='var2=name',
ylab='var1=name',
pch=20
col='gray50')
29
Q

simple linear regression?

A

lmfit=lm(var1~var2,data=name)

30
Q

summary of lm?

A

summary(lmfit)

31
Q

upper and lower range of CI 95%?

A

confit(lmfit, level=0.95)

32
Q

Regression when X is binary

Create a dummy variable that equals to one if rm is above the sample median

A

mydata$dummy=ifelse(mydata$x>=median(mydata$x),1,0)

33
Q

plot scatterplot with fitted line?

A

lmfit1=lmfit(var1~mydata, data=mydata)

plot(var1~mydata$dummy, main=’scatterplot of var1 vs var2’,xlab=’var2’, ylab=’var1’,pch=20,col=’gray50’)

abline(lmfit1,lwd=2,col=’deeppink3’)