STAT 2 - LAB 1 Flashcards

Question 1

Q

Load the Boston data set from the MASS library

Answer

A

library(MASS)
data(“Boston”)

Question 2

Q

Retrieve description of Boston dataset

Answer

A

help(Boston)
or
?Boston

Question 3

Q

get a glimpse of structure of data

Answer

A

str(data)
glimpse(data)
names(data)

Question 4

Q

change int variable into factor

Answer

A

census_tracts$chas <- as.factor(census_tracts$chas)

Question 5

Q

check levels of a factor variable

Answer

A

levels(census_tracts$chas)

Question 6

Q

compute basic statistics for each variable

Answer

A

summary(census_tracts)

Question 7

Q

take all numeric variables and exclude categorical ones

Answer

A

var_interest_numeric <- colnames(census_tracts)[!(colnames(census_tracts) %in% c(“chas”, “rad”))]

Question 8

Q

plot a side-by-side histogram analysing multiple columns

Answer

A

census_tracts %>%
select(crim, zn, age) %>%
gather(cols, value) %>%
ggplot(aes(x = value)) + geom_histogram(bins = 20) + facet_wrap(.~ cols, ncol = 3)

Question 9

Q

compute correlation between quantitative variables

Answer

A

cor(census_tracts[,var_interest_numeric]

rounded:
round(cor(census_tracts[,var_interest_numeric]), 2)

Question 10

Q

plot the correlation matrix

Answer

A

heatmap(corr_matrix)

library(corrplot)
corrplot::corrplot(corr_matrix)
(you can plot it like this and consider the upper triangle or you can perform this before:
corr_matrix[lower.tri(corr_matrix)] <- 0 (to visualize only upper triangle))

Question 11

Q

scatterplots of relationships between quantitative variables

Answer

A

with function pairs:

pairs(census_tracts[, c(“medv”, “lstat”,”dis”)])

Question 12

Q

fit a simple linear regression model

Answer

A

lm.fit.simple <- lm(formula = medv ~ lstat, data = census_tracts)

Question 13

Q

Calculate 95% confidence intervals for —0 and —1 using the function confint.

Answer

A

confint(lm.fit.simple)

Question 14

Q

Calculate 95% confidence intervals for medv at values of lstat are 5,10,15 using the function predict.

Answer

A

predict(lm.fit.simple,
newdata = data.frame(lstat = (c(5,10,15))),
interval = “confidence”)

Question 15

Q

Calculate 95% prediction intervals medv at values of lstat are 5,10,15 using the function predict.

Answer

A

predict(lm.fit.simple,
newdata = data.frame(lstat = (c(5,10,15))),
interval = “prediction”)

Question 16

Q

Fit multiple linear regression model with medv as the response variable and lstat, and dis as the predictor

Answer

Study These Flashcards

A

lm.fit.multiple <- lm(formula = medv ~ lstat + dis, data = census_tracts)

Question 17

Q

compare multiple and simple linear regression model by using anova

Answer

Study These Flashcards

A

anova(lm.fit.simple,lm.fit.multiple)

Question 18

Q

Fit the multiple linear regression model with medv as the response variable, and all quantitative variables as predictors

Answer

Study These Flashcards

A

lm.fit.full <- lm(formula = medv ~ . -chas -rad, data = census_tracts)

Question 19

Q

Answer

Study These Flashcards

A

STAT 2 - LAB 1 Flashcards

(19 cards)