Section 1 - Tutorials R and In-Class Questions R Flashcards
Tutorial 1
Q5 This question uses the assignment and exam mark data used in Question 2 above.
This data can be found on Canvas in the file “Exam data.csv”.
i)
a) Load the data and store it in the data frame examdata.
b) Fit a linear regression model, exammodel, of exam mark on assignment mark.
ii)
a) Obtain the slope and intercept parameters.
b) Plot a labelled scattergraph of the data and add a blue dashed regression line to your scatterplot.
iii) Obtain the fitted values:
a) By extracting them from exammodel
b) Using the fitted command
c) Using the predict command
iv) Add red points to the scattergraph to show the fitted values.
v) Obtain the expected exam mark for an assignment mark of 62:
a) From first principles using the coefficients from exammodel
b) Using the predict function.
exammodel <- lm(examdata[,2] ~ examdata[,1])
Q5
### part (i)
#—————————————————————————————-
(a) load data frame and store in “exam”
examdata <- read.table(“Exam data.csv”, sep = “,”, header = TRUE)
examdata
(b) fit a linear regression model
attach(examdata)
exammodel <- lm(Exam ~ Assignment)
If you prefer not to attach the data, you could also use any of the following:
exammodel <- lm(Exam ~ Assignment, data=examdata)
# exammodel <- lm(examdata$Assignment ~ examdata$Exam)
### part (ii)
#—————————————————————————————-
(a) Obtain slope and intercept parameters
we can obtain the parameters by printing the linear model
exammodel
or we can use the summary() function
summary(exammodel)
or using the coefficients function
coef(exammodel)
Answer: slope 1.015, intercept 10.741
(b) plot scattergraph with blue regression line
plot(examdata, main=”Exam result based on assignment mark”,
xlab=”Assignment mark”, ylab=”Exam result”, pch=3)
abline(exammodel, col=”blue”, lty=”dashed”)
### (iii) fitted values
#—————————————————————————————-
(a) using exammodel
exammodel$fitted
can also use exammodel$fitted.values
(b) using fitted()
fitted(exammodel)
can also use fitted.values(exammodel)
(c) using predict()
predict(exammodel)
Answer: 53.38013 64.54749 69.62356 76.73006 83.83656 86.88220
### (iv) Add predicted values to scatterplot
#—————————————————————————————-
points(examdata$Assignment, fitted(exammodel), col=”red”, pch=16)
Can also use:
# points(examdata[,1], fitted(exammodel), col=”red”, pch=16)
### (v)
#—————————————————————————————-
(a) from first principles
coef(exammodel)[1]+coef(exammodel)[2]*62
(b) using predict()
Wrap the assignment mark parameter in a data frame
newdata <- data.frame(Assignment=62)
Then use predict()
predict(exammodel, newdata)
Answer 73.68
In-Class 1 Question 2
January 2022 written exam Q8 - adapted
Q1 A sportswear manufacturing company has designed a new running shoe that it
believes will help people achieve faster times. The following table shows the
times taken in minutes by ten athletes to run 10km, with and without the new
shoes.
Without new shoes (𝑥) With new shoes (𝑦)
(minutes) (minutes)
45 43
49 46
53 60
58 50
59 54
62 54
67 56
72 65
76 62
83 75
You are told that for these data 𝑆𝑥𝑥 = 1,324.4, 𝑆𝑦𝑦 = 804.5, and 𝑆𝑥𝑦 = 916.
(i) Draw a scatterplot of the data. Comment briefly on the relationship between time taken to run 10km with and without the new shoes.
[4]
(ii) Show that the equation of the line of best fit is given by 𝑦 = 13.344 + 0.6916𝑥.
(iii) Perform a test of the hypothesis that the slope parameter is 0. [5]
Extension:
Q2 The data is stored on Canvas in the file “Running shoes.csv”. Repeat your analysis
above in R Studio.
###############################################################################
### FIN3026 - Section 1 Tutorial - In Class Question ###
###############################################################################
runners <- read.csv(“Running shoes.csv”, header = TRUE)
runners
plot
plot(runners$x, runners$y, xlab = “Old shoes”, ylab = “New shoes”,
main = “Time taken to run 10km”)
fit model
shoesimprove <- lm(runners$y ~ runners$x)
summary(shoesimprove)
test if beta = 0:
p-value < 0.05, can reject null hypothesis that beta = 0
Tutorial 2 Questions 6
Q6 This question uses the Assignment and Exam data, you can continue in the same
script as Q5.
i) Obtain the total sum of squares in the exam result model together with its
split between the residual sum of squares and regression sum of squares:
a) Using the ANOVA command
b) From first principles using the functions sum, mean, fitted and residuals.
ii) Obtain 𝑅2, the coefficient of determination:
a) Using the linear regression model, exammodel
b) By calculation from the values in the ANOVA table
—————————————————————————————-
Q6
### (i) sum of squares
#—————————————————————————————-
(a) Using ANOVA
residual and regression sum of squares given in
anova(exammodel)
add them up to get the total sum of squares
anova(exammodel)[1,2] + anova(exammodel)[2,2]
Answer: SSREG=790.34, SSRES=35.16, SSTOT=825.5
(b) from first principles
define x and y for ease
x <- examdata$Assignment
y <- examdata$Exam
n <- nrow(examdata)
SS TOT
SSTOT <- sum((y-mean(y))^2)
SSTOT
SS RES
SSRES <- sum(residuals(exammodel)^2)
SSRES
SS REG
SSREG <- sum((fitted(exammodel)-mean(y))^2)
SSREG
### (ii) coefficient of determination
#—————————————————————————————-
(a) from summary
summary(exammodel)
we can extract it from the summary as follows
summary(exammodel)$r.squared
(b) from anova
anova(exammodel)[1,2]/(anova(exammodel)[1,2]+anova(exammodel)[2,2])
Answer 0.9574
Tutorial 2 Questions 7
Q7 This question uses the Assignment and Exam data, you can continue in the same
script as Q5 and Q6.
i) Obtain the statistic and p-value for a test of 𝐻0: 𝛽 = 0 vs 𝐻1: 𝛽 ≠ 0.
ii) Use confint to:
a) Obtain a 99% confidence interval for the slope parameter
b) Test at the 5% level whether 𝛽 = 0.9
—————————————————————————————-
Q7
### (i) test beta=0
#—————————————————————————————-
take from output of summary()
t statistic is 9.483, p-value 0.00069
we can reject H0, beta non zero
### (ii) confidence intervals
#—————————————————————————————-
confint(exammodel, level = 0.99)
answer is (0.522, 1.508)
confint(exammodel, level = 0.95)
answer is (0.718, 1.312) - contains beta=0.9 and so we don’t reject null hypothesis
Tutorial 2 Questions 8
Q8 This question uses the Assignment and Exam data, you can continue in the same
script as Q5, Q6 and Q7.
i) Estimate the mean exam mark following an assignment mark of 55, and
obtain 95% and 99% confidence intervals.
ii) Estimate the exam mark for an individual following an assignment mark of
55, and obtain 95% and 99% confidence intervals.
iii) Find the residuals for the regression model
a) Using the fitted command
b) Using the residuals function
iv) Obtain a plot if residuals against the fitted values, and comment on
whether a linear model is appropriate.
v) Obtain a Q-Q plot and comment on the normality assumption
—————————————————————————————-
Q8
### (i) mean exam mark
#—————————————————————————————-
addldata <- data.frame(Assignment=55)
predict(exammodel, addldata)
answer: 66.6
predict(exammodel, addldata, interval=”confidence”, level=0.95)
(62.8, 70.4)
predict(exammodel, addldata, interval=”confidence”, level=0.99)
(60.3, 72.8)
### (ii) individual exam mark
#—————————————————————————————-
answer: 66.6 as above
predict(exammodel, addldata, interval=”predict”, level=0.95)
(57.5, 75.6)
predict(exammodel, addldata, interval=”predict”, level=0.99)
(51.6, 81.6)
### (iii) residuals
#—————————————————————————————-
using fitted
examdata$Exam - fitted(exammodel)
using residuals
residuals(exammodel)
or
exammodel$residuals
### (iv) plot residuals vs fitted values
#—————————————————————————————-
plot(exammodel,1)
### (v) Q-Q plot
#—————————————————————————————-
plot(exammodel,2)
detach(examdata)
In-Class 2 Question 1
December 2021 Computer Based Assessment Question 2
Q1 An insurance company offers a specialised policy for gardeners under which claims can be made for damage to valuable plants. The insurer suspects a
relationship between wind speed and claim numbers and has asked you to
investigate.
This question uses the windspeed.csv data set uploaded to Canvas. The data
set shows the average wind speed in knots for each month in 2018 and 2019, and
the number of claims received in each month.
(i) Construct a scatterplot of the data and comment on the nature of the
relationship between wind speed and the number of claims. [5]
(ii) Perform any necessary transformation of the data, and construct a linear model
of the relationship between wind speed and number of claims. Briefly justify your
choice of data transformation, if any. [5]
(iii) Construct a scatterplot of your transformed data and add the line of regression calculated in (ii). [2]
(iv) Obtain the 95% confidence interval for the expected number of claims in months
with average wind speed 4.123 knots. [4]
[Total 16 Marks]
Consider the output below to compare
insurdata <- read.csv(“Windspeed.csv”, header=TRUE)
Wind <- insurdata[,2]
Claims <- insurdata[,3]
(i)
plot(Wind, Claims)
(ii)
Take logs to fit linear model
plot(Wind, log(Claims))
linearmodel <- lm(log(Claims)~Wind)
summary(linearmodel)
We can fit a linear model to the data without taking logs
wrongmodel <- lm(Claims~Wind)
summary(wrongmodel)
(iii)
plot(Wind,log(Claims))
abline(linearmodel)
plot(Wind, Claims)
abline(wrongmodel)
(iv)
newdata <- data.frame(Wind=4.123)
logans <- predict(linearmodel, newdata)
confint <- predict(linearmodel, newdata, interval=”confidence”, level=0.95)
answer <- exp(confint)
answer