Section 1 - Tutorials R and In-Class Questions R Flashcards

Question 1

Q

Tutorial 1
Q5 This question uses the assignment and exam mark data used in Question 2 above.
This data can be found on Canvas in the file “Exam data.csv”.
i)
a) Load the data and store it in the data frame examdata.
b) Fit a linear regression model, exammodel, of exam mark on assignment mark.
ii)
a) Obtain the slope and intercept parameters.
b) Plot a labelled scattergraph of the data and add a blue dashed regression line to your scatterplot.
iii) Obtain the fitted values:
a) By extracting them from exammodel
b) Using the fitted command
c) Using the predict command
iv) Add red points to the scattergraph to show the fitted values.
v) Obtain the expected exam mark for an assignment mark of 62:
a) From first principles using the coefficients from exammodel
b) Using the predict function.

Answer

A

exammodel <- lm(examdata[,2] ~ examdata[,1])

Q5

### part (i)
#—————————————————————————————-

(a) load data frame and store in “exam”

examdata <- read.table(“Exam data.csv”, sep = “,”, header = TRUE)
examdata

(b) fit a linear regression model

attach(examdata)
exammodel <- lm(Exam ~ Assignment)

If you prefer not to attach the data, you could also use any of the following:

exammodel <- lm(Exam ~ Assignment, data=examdata)
# exammodel <- lm(examdata$Assignment ~ examdata$Exam)

### part (ii)
#—————————————————————————————-

(a) Obtain slope and intercept parameters

we can obtain the parameters by printing the linear model

exammodel

or we can use the summary() function

summary(exammodel)

or using the coefficients function

coef(exammodel)

Answer: slope 1.015, intercept 10.741

(b) plot scattergraph with blue regression line

plot(examdata, main=”Exam result based on assignment mark”,
xlab=”Assignment mark”, ylab=”Exam result”, pch=3)

abline(exammodel, col=”blue”, lty=”dashed”)

### (iii) fitted values
#—————————————————————————————-

(a) using exammodel

exammodel$fitted

can also use exammodel$fitted.values

(b) using fitted()

fitted(exammodel)

can also use fitted.values(exammodel)

(c) using predict()

predict(exammodel)

Answer: 53.38013 64.54749 69.62356 76.73006 83.83656 86.88220

### (iv) Add predicted values to scatterplot
#—————————————————————————————-

points(examdata$Assignment, fitted(exammodel), col=”red”, pch=16)

Can also use:
# points(examdata[,1], fitted(exammodel), col=”red”, pch=16)

### (v)
#—————————————————————————————-

(a) from first principles

coef(exammodel)[1]+coef(exammodel)[2]*62

(b) using predict()

Wrap the assignment mark parameter in a data frame

newdata <- data.frame(Assignment=62)

Then use predict()

predict(exammodel, newdata)

Answer 73.68

Question 2

Q

In-Class 1 Question 2
January 2022 written exam Q8 - adapted
Q1 A sportswear manufacturing company has designed a new running shoe that it
believes will help people achieve faster times. The following table shows the
times taken in minutes by ten athletes to run 10km, with and without the new
shoes.
Without new shoes (𝑥) With new shoes (𝑦)
(minutes) (minutes)
45 43
49 46
53 60
58 50
59 54
62 54
67 56
72 65
76 62
83 75
You are told that for these data 𝑆𝑥𝑥 = 1,324.4, 𝑆𝑦𝑦 = 804.5, and 𝑆𝑥𝑦 = 916.
(i) Draw a scatterplot of the data. Comment briefly on the relationship between time taken to run 10km with and without the new shoes.
[4]
(ii) Show that the equation of the line of best fit is given by 𝑦 = 13.344 + 0.6916𝑥.
(iii) Perform a test of the hypothesis that the slope parameter is 0. [5]
Extension:
Q2 The data is stored on Canvas in the file “Running shoes.csv”. Repeat your analysis
above in R Studio.

Answer

A

###############################################################################
### FIN3026 - Section 1 Tutorial - In Class Question ###
###############################################################################

runners <- read.csv(“Running shoes.csv”, header = TRUE)
runners

plot

plot(runners$x, runners$y, xlab = “Old shoes”, ylab = “New shoes”,
main = “Time taken to run 10km”)

fit model

shoesimprove <- lm(runners$y ~ runners$x)

summary(shoesimprove)

test if beta = 0:

p-value < 0.05, can reject null hypothesis that beta = 0

Question 3

Q

Tutorial 2 Questions 6
Q6 This question uses the Assignment and Exam data, you can continue in the same
script as Q5.
i) Obtain the total sum of squares in the exam result model together with its
split between the residual sum of squares and regression sum of squares:
a) Using the ANOVA command
b) From first principles using the functions sum, mean, fitted and residuals.
ii) Obtain 𝑅2, the coefficient of determination:
a) Using the linear regression model, exammodel
b) By calculation from the values in the ANOVA table

Answer

A

—————————————————————————————-

Q6

### (i) sum of squares
#—————————————————————————————-

(a) Using ANOVA

residual and regression sum of squares given in

anova(exammodel)

add them up to get the total sum of squares

anova(exammodel)[1,2] + anova(exammodel)[2,2]

Answer: SSREG=790.34, SSRES=35.16, SSTOT=825.5

(b) from first principles

define x and y for ease

x <- examdata$Assignment
y <- examdata$Exam

n <- nrow(examdata)

SS TOT

SSTOT <- sum((y-mean(y))^2)
SSTOT

SS RES

SSRES <- sum(residuals(exammodel)^2)
SSRES

SS REG

SSREG <- sum((fitted(exammodel)-mean(y))^2)
SSREG

### (ii) coefficient of determination
#—————————————————————————————-

(a) from summary

summary(exammodel)

we can extract it from the summary as follows

summary(exammodel)$r.squared

(b) from anova

anova(exammodel)[1,2]/(anova(exammodel)[1,2]+anova(exammodel)[2,2])

Answer 0.9574

Question 4

Q

Tutorial 2 Questions 7
Q7 This question uses the Assignment and Exam data, you can continue in the same
script as Q5 and Q6.
i) Obtain the statistic and p-value for a test of 𝐻0: 𝛽 = 0 vs 𝐻1: 𝛽 ≠ 0.
ii) Use confint to:
a) Obtain a 99% confidence interval for the slope parameter
b) Test at the 5% level whether 𝛽 = 0.9

Answer

A

—————————————————————————————-

Q7

### (i) test beta=0
#—————————————————————————————-

take from output of summary()

t statistic is 9.483, p-value 0.00069

we can reject H0, beta non zero

### (ii) confidence intervals
#—————————————————————————————-

confint(exammodel, level = 0.99)

answer is (0.522, 1.508)

confint(exammodel, level = 0.95)

answer is (0.718, 1.312) - contains beta=0.9 and so we don’t reject null hypothesis

Question 5

Q

Tutorial 2 Questions 8
Q8 This question uses the Assignment and Exam data, you can continue in the same
script as Q5, Q6 and Q7.
i) Estimate the mean exam mark following an assignment mark of 55, and
obtain 95% and 99% confidence intervals.
ii) Estimate the exam mark for an individual following an assignment mark of
55, and obtain 95% and 99% confidence intervals.
iii) Find the residuals for the regression model
a) Using the fitted command
b) Using the residuals function
iv) Obtain a plot if residuals against the fitted values, and comment on
whether a linear model is appropriate.
v) Obtain a Q-Q plot and comment on the normality assumption

Answer

A

—————————————————————————————-

Q8

### (i) mean exam mark
#—————————————————————————————-

addldata <- data.frame(Assignment=55)

predict(exammodel, addldata)

answer: 66.6

predict(exammodel, addldata, interval=”confidence”, level=0.95)

(62.8, 70.4)

predict(exammodel, addldata, interval=”confidence”, level=0.99)

(60.3, 72.8)

### (ii) individual exam mark
#—————————————————————————————-

answer: 66.6 as above

predict(exammodel, addldata, interval=”predict”, level=0.95)

(57.5, 75.6)

predict(exammodel, addldata, interval=”predict”, level=0.99)

(51.6, 81.6)

### (iii) residuals
#—————————————————————————————-

using fitted

examdata$Exam - fitted(exammodel)

using residuals

residuals(exammodel)

or

exammodel$residuals

### (iv) plot residuals vs fitted values
#—————————————————————————————-

plot(exammodel,1)

### (v) Q-Q plot
#—————————————————————————————-

plot(exammodel,2)

detach(examdata)

Question 6

Q

In-Class 2 Question 1
December 2021 Computer Based Assessment Question 2
Q1 An insurance company offers a specialised policy for gardeners under which claims can be made for damage to valuable plants. The insurer suspects a
relationship between wind speed and claim numbers and has asked you to
investigate.
This question uses the windspeed.csv data set uploaded to Canvas. The data
set shows the average wind speed in knots for each month in 2018 and 2019, and
the number of claims received in each month.
(i) Construct a scatterplot of the data and comment on the nature of the
relationship between wind speed and the number of claims. [5]
(ii) Perform any necessary transformation of the data, and construct a linear model
of the relationship between wind speed and number of claims. Briefly justify your
choice of data transformation, if any. [5]
(iii) Construct a scatterplot of your transformed data and add the line of regression calculated in (ii). [2]
(iv) Obtain the 95% confidence interval for the expected number of claims in months
with average wind speed 4.123 knots. [4]
[Total 16 Marks]

Answer

A

Consider the output below to compare

insurdata <- read.csv(“Windspeed.csv”, header=TRUE)

Wind <- insurdata[,2]
Claims <- insurdata[,3]

(i)

plot(Wind, Claims)

(ii)

Take logs to fit linear model

plot(Wind, log(Claims))

linearmodel <- lm(log(Claims)~Wind)

summary(linearmodel)

We can fit a linear model to the data without taking logs

wrongmodel <- lm(Claims~Wind)
summary(wrongmodel)

(iii)

plot(Wind,log(Claims))
abline(linearmodel)

plot(Wind, Claims)
abline(wrongmodel)

(iv)

newdata <- data.frame(Wind=4.123)

logans <- predict(linearmodel, newdata)

confint <- predict(linearmodel, newdata, interval=”confidence”, level=0.95)

answer <- exp(confint)
answer

Section 1 - Tutorials R and In-Class Questions R Flashcards

(6 cards)