Chapter 2 Flashcards

Question 1

Q

How to check the strength of a linear relationship using R

Answer

A

Using the cor() function and specifying method=’pearson’

Pearson’s correlation coefficient in R = cor(data$xaxis, data$yaxis, method=’pearson’)

Question 2

Q

State the FOUR assumptions when undertaking linear regression modelling.

Answer

A

Linearity: assume a linear relationship. The relationship between the dependant (response) variable and the independent (predictor) variable should be linear.

Independence: there should be independence between the observations.

Normality of residuals: error must be normally distributed.

Homoscedasticity: the variance of the residuals should be constant across any value of the predicted variable.

Better:

Linearity:
The relationship between the dependent (response) variable and the independent (predictor) variable(s) should be linear. This means that the expected change in the response variable should correspond proportionally to changes in the predictor(s), forming a straight-line relationship.
Independence:
Observations of the response variable should be independent of each other. In other words, the residuals should not be correlated with each other, which ensures that each observation provides unique information.
Normality of Residuals:
The residuals (errors) should be normally distributed around zero. This is important for statistical inference, ensuring accurate confidence intervals and hypothesis testing.
Homoscedasticity:
The residuals should exhibit constant variance across all predicted values of the response variable. In other words, the scatter of residuals should remain approximately the same for all fitted values.

Question 3

Q

How to check the strength of a linear relationship using R and P-value

Answer

A

cor.test
Using cor.test will give a p-value: cor.test(data$xaxis,data$yaxis, method=’pearson’)

Question 4

Q

equation for a straight line

Answer

A

Y= mx + c + (e - residual error)

m is the gradient (slope) of the line

c is the intercept on the y axis

Question 5

Q

Define the terms: y = β0 + β1 ∗ x + ε

Answer

A

y = Dependent/response variable
β0 = Intercept
β1 = Slope of the gradient
x = Independent/predictor variable
ε = Error left over (residual error)

Question 6

Q

Define residual error in the context of linear regression (1 mark)

Answer

A

In the context of linear regression, residual error refers to the difference between the observed value of the dependent variable and the value predicted by the linear regression model.

Question 7

Q

What is the difference between cor() and cor.test() function

Answer

A

cor(): Directly computes the correlation coefficient(s) without providing statistical significance.
cor.test(): Conducts a hypothesis test to provide statistical significance (p-value), confidence intervals, and other summary information.

Question 8

Q

To plot a cor()/cor.test in R:

Answer

A

plot(data$xaxis, data$yaxis, xlab=’x-label’, ylab=’y-label’)

Question 9

Q

linear regression line of best fit using ggplot

Answer

A

ggplot(data=, aes(x=, y=)) +
geom_point()+
xlab(“”)+ ylab(“”)+
stat_smooth(method = “lm”, col = “blue”,se=F)  linear regression line of best fit

Question 10

Q

Explain the concept of a 95% confidence interval and a 95% prediction interval

Answer

A

A 95% confidence interval (CI) provides a range of values within which we expect the true population parameter (e.g., mean, correlation coefficient) to fall 95% of the time, assuming repeated random sampling.
It is calculated based on the standard error and a critical value from the normal distribution (or other appropriate distribution).

A 95% prediction interval (PI) provides a range of values within which we expect an individual future observation to fall, given existing data.
It is wider than the confidence interval because it accounts for the variability in predicting individual data points.

Question 11

Q

state the difference between 95% confidence interval and a 95% prediction interval

Answer

A

Differences Between Confidence and Prediction Intervals:
1. Purpose:
o Confidence Interval: Estimates the range where the population parameter lies.
o Prediction Interval: Estimates the range where an individual future observation will fall.
2. Width of Interval:
o Confidence Interval: Narrower because it estimates a single parameter (like a mean).
o Prediction Interval: Wider because it must account for both the uncertainty in the parameter estimate and the natural variation of individual observations.
3. Applicability:
o Confidence Interval: Useful for making inferences about population parameters.
o Prediction Interval: Useful for predicting future outcomes based on existing data.

Question 12

Q

How to Calculate confidence intervals using R

Answer

A

Use the predict() function
interval = ‘confidence’)

Predict(fit, interval = ‘confidence’)

Question 13

Q

How to Calculate prediction intervals using R

Answer

A

We just change the command from interval=’confidence’ to interval=’prediction’

predict(fit,interval=’prediction’)

Question 14

Q

Calculate concentration at steady state (equation)

Answer

A

Concentration at steady state: Css= Infusion rate /Clearance

Question 15

Q

what is Css equivalent to?

Answer

A

Css= Infusion rate /Clearance = 1/clearance *infusion rate

Question 16

Q

What is 1/clearance equivilant to?

Answer

A

1/Clearance is equivalent to β1 and so we can see that clearance = 1/β1

Question 17

Q

How to clear objects in the R global environment?

Answer

A

rm(list=ls()) <this will remove all objects
rm()

Question 18

Q

Equation for infusion rate

Answer

A

Infusion rate (mg/l) = CSS * Clearance

Question 19

Q

How to examine distribution in R

Answer

A

Basic R plot:

hist ()

Question 20

Q

What is the runif() function?

Answer

A

The runif() function in R generates random numbers following a uniform distribution. A uniform distribution means that every value within a specified range has an equal probability of being chosen.

R can sample from a uniform distribution using the function runif(). n= observation size
give the instructions runif(n=, min=,max=)

*n: The number of random numbers to generate.
*min: The minimum value in the range (default is 0).
*max: The maximum value in the range (default is 1).

Question 21

Q

What is the rnorm() function?

Answer

A

rnorm(): rnorm(n= , mean= , sd=)
Purpose: Generates random numbers from a normal (Gaussian) distribution.
Distribution Characteristics:
Normal Distribution: Bell-shaped curve characterized by the mean and standard deviation.
Parameters:
n (number of values to generate)
mean (mean of the distribution, default is 0)
sd (standard deviation, default is 1)

Question 22

Q

runif (). versus rnorm ()

Answer

A

runif(): Generates numbers uniformly distributed across the specified range.
rnorm(): Generates numbers normally distributed around a specified mean with a specified standard deviation.

runif () <- for uniform distribution

runif():
Purpose: Generates random numbers from a uniform distribution.
Distribution Characteristics:
Uniform Distribution: Each value within a specified range is equally likely.
Parameters:
n (number of values to generate)
min (lower bound, default is 0)
max (upper bound, default is 1)

rnorm () <- for normal distribution

rnorm():
Purpose: Generates random numbers from a normal (Gaussian) distribution.
Distribution Characteristics:
Normal Distribution: Bell-shaped curve characterized by the mean and standard deviation.
Parameters:
n (number of values to generate)
mean (mean of the distribution, default is 0)
sd (standard deviation, default is 1)

Question 23

Q

How to save a data frame as a csv file

Answer

A

write.csv ()

write.csv(data, “data.csv”, row.names = F)

Question 24

Q

Explain the difference between confidence interval and a prediction interval in the context of linear regression analysis in R (3 marks)

Answer

A

CIs are used to indicate the reliability of an estimate of the parameter. For example, the CI around a regression coefficient provides a range that, with a specified probability, contains the true value

A prediction interval provides a range within which future observations are expected to fall, with a specified level of confidence. Unlike the CI, which focuses on the mean response, the PI accounts for the additional variability around individual predicted values.
PIs are used when predicting new outcomes from the regression model, reflecting both the uncertainty in estimating the model parameters and the natural random variability of individual outcomes.