STAT 2 - LAB 1 Flashcards

1
Q

Load the Boston data set from the MASS library

A

library(MASS)
data(“Boston”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Retrieve description of Boston dataset

A

help(Boston)
or
?Boston

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

get a glimpse of structure of data

A

str(data)
glimpse(data)
names(data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

change int variable into factor

A

census_tracts$chas <- as.factor(census_tracts$chas)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

check levels of a factor variable

A

levels(census_tracts$chas)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

compute basic statistics for each variable

A

summary(census_tracts)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

take all numeric variables and exclude categorical ones

A

var_interest_numeric <- colnames(census_tracts)[!(colnames(census_tracts) %in% c(“chas”, “rad”))]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

plot a side-by-side histogram analysing multiple columns

A

census_tracts %>%
select(crim, zn, age) %>%
gather(cols, value) %>%
ggplot(aes(x = value)) + geom_histogram(bins = 20) + facet_wrap(.~ cols, ncol = 3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

compute correlation between quantitative variables

A

cor(census_tracts[,var_interest_numeric]

rounded:
round(cor(census_tracts[,var_interest_numeric]), 2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

plot the correlation matrix

A

heatmap(corr_matrix)

library(corrplot)
corrplot::corrplot(corr_matrix)
(you can plot it like this and consider the upper triangle or you can perform this before:
corr_matrix[lower.tri(corr_matrix)] <- 0 (to visualize only upper triangle))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

scatterplots of relationships between quantitative variables

A

with function pairs:

pairs(census_tracts[, c(“medv”, “lstat”,”dis”)])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

fit a simple linear regression model

A

lm.fit.simple <- lm(formula = medv ~ lstat, data = census_tracts)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Calculate 95% confidence intervals for —0 and —1 using the function confint.

A

confint(lm.fit.simple)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Calculate 95% confidence intervals for medv at values of lstat are 5,10,15 using the function predict.

A

predict(lm.fit.simple,
newdata = data.frame(lstat = (c(5,10,15))),
interval = “confidence”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Calculate 95% prediction intervals medv at values of lstat are 5,10,15 using the function predict.

A

predict(lm.fit.simple,
newdata = data.frame(lstat = (c(5,10,15))),
interval = “prediction”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Fit multiple linear regression model with medv as the response variable and lstat, and dis as the predictor

A

lm.fit.multiple <- lm(formula = medv ~ lstat + dis, data = census_tracts)

17
Q

compare multiple and simple linear regression model by using anova

A

anova(lm.fit.simple,lm.fit.multiple)

18
Q

Fit the multiple linear regression model with medv as the response variable, and all quantitative variables as predictors

A

lm.fit.full <- lm(formula = medv ~ . -chas -rad, data = census_tracts)