Simple linear regression Flashcards by Eliza Ong

what is linear regression?

simple approach to supervised learning. it is used to model the relationship between several input variables (x) and a continuous response variable (y)

How well did you know this?

Not at all

Perfectly

assumed model?

y=β0+β1 X+e

How well did you know this?

Not at all

Perfectly

distance between observed and predicted values?

residual e= Yi-predicted Yi= Yi-(β0+β1Xi)

How well did you know this?

Not at all

Perfectly

residual sum squares (RSS)?

total magnitude of deviations from all squared residuals of data points (sum) (residual may be positive or negative thus square)

How well did you know this?

Not at all

Perfectly

to find β0 and β1, use estimation of least squares

first order derivatives of RSS w/ respect to β0 and β1 separately, set to 0

How well did you know this?

Not at all

Perfectly

predicated β1?

cov(x,y)/var(x)

How well did you know this?

Not at all

Perfectly

predicted β0?

mean(y) - β1*mean(x)

How well did you know this?

Not at all

Perfectly

what is standard error(SE) an estimator for?

how the estimates vary under repeated sampling

How well did you know this?

Not at all

Perfectly

hypothesis testing for relationship between x and y?

H0: β1=0 H1: β1!=0

How well did you know this?

Not at all

Perfectly

t-statistics(to test null hypothesis)

t=(β1-0)/SE(β1). n-2 degrees of freedom

How well did you know this?

Not at all

Perfectly

critical value and confidence interval when n is large?

1.96(as n increases, t-dist gets closer to normal dist) and 95%(as n increases, t-dist gets closer to normal dist)

How well did you know this?

Not at all

Perfectly

p-value definition?

probability of observing any value >= |t|

How well did you know this?

Not at all

Perfectly

calculate confidence interval

[β1+-1.96*SE(β1)]

How well did you know this?

Not at all

Perfectly

when to reject null hypothesis?

when |t| both larger than 1.96, we can reject H0 with 95% confidence

How well did you know this?

Not at all

Perfectly

Residual standard error means?

RSE measures lack of fit, if RSE=3.259, on avg, deviation of Y from regression line is 3.259 points

How well did you know this?

Not at all

Perfectly

R squared for?

Study These Flashcards

measures how well regression model describes data. e.g. if R squared is 0.6119, X explains only 61.19% of subject

how to measure RSE

Study These Flashcards

sqrt(1/(n-2)RSS)

measure R square?

Study These Flashcards

1-(RSS/TSS). TSS is the total variance in response variable y. (ranges from 0 to 1)

for 95% CI, use β1 or β0

Study These Flashcards

β1. β0 has nothing to do with r/s between X and Y

how to install package MASS

Study These Flashcards

intall.package(‘MASS’)

load MASS?

Study These Flashcards

library(MASS)

load data Boston in MASS?

Study These Flashcards

data(Boston)

documentation in data set?

Study These Flashcards

?Boston

number of missing values?

Study These Flashcards

sum(is.na(Boston))

number of duplicated values?

sum(duplicated(Boston))

find outliers for both variables?

boxplot. stats(Boston$var1)$out | boxplot. stats(Boston$var2)$out

reduce dataset to subset of the 2 variables?

name=subset(Boston, select=c(var1,var2))

scatterplot? var1 being y and var2 being x

``` plot(var1~var2, main='Scatterplot of var1 vs var2', xlab='var2=name', ylab='var1=name', pch=20 col='gray50') ```

simple linear regression?

lmfit=lm(var1~var2,data=name)

summary of lm?

summary(lmfit)

upper and lower range of CI 95%?

confit(lmfit, level=0.95)

Regression when X is binary | Create a dummy variable that equals to one if rm is above the sample median

mydata$dummy=ifelse(mydata$x>=median(mydata$x),1,0)

plot scatterplot with fitted line?

lmfit1=lmfit(var1~mydata, data=mydata) plot(var1~mydata$dummy, main='scatterplot of var1 vs var2',xlab='var2', ylab='var1',pch=20,col='gray50') abline(lmfit1,lwd=2,col='deeppink3')

Simple linear regression Flashcards

(33 cards)