Chapter 2 Flashcards
How to check the strength of a linear relationship using R
Using the cor() function and specifying method=’pearson’
Pearson’s correlation coefficient in R = cor(data$xaxis, data$yaxis, method=’pearson’)
State the FOUR assumptions when undertaking linear regression modelling.
Linearity: assume a linear relationship. The relationship between the dependant (response) variable and the independent (predictor) variable should be linear.
Independence: there should be independence between the observations.
Normality of residuals: error must be normally distributed.
Homoscedasticity: the variance of the residuals should be constant across any value of the predicted variable.
Better:
Linearity:
The relationship between the dependent (response) variable and the independent (predictor) variable(s) should be linear. This means that the expected change in the response variable should correspond proportionally to changes in the predictor(s), forming a straight-line relationship.
Independence:
Observations of the response variable should be independent of each other. In other words, the residuals should not be correlated with each other, which ensures that each observation provides unique information.
Normality of Residuals:
The residuals (errors) should be normally distributed around zero. This is important for statistical inference, ensuring accurate confidence intervals and hypothesis testing.
Homoscedasticity:
The residuals should exhibit constant variance across all predicted values of the response variable. In other words, the scatter of residuals should remain approximately the same for all fitted values.
How to check the strength of a linear relationship using R and P-value
cor.test
Using cor.test will give a p-value: cor.test(data$xaxis,data$yaxis, method=’pearson’)
equation for a straight line
Y= mx + c + (e - residual error)
m is the gradient (slope) of the line
c is the intercept on the y axis
Define the terms: y = β0 + β1 ∗ x + ε
y = Dependent/response variable
β0 = Intercept
β1 = Slope of the gradient
x = Independent/predictor variable
ε = Error left over (residual error)
Define residual error in the context of linear regression (1 mark)
In the context of linear regression, residual error refers to the difference between the observed value of the dependent variable and the value predicted by the linear regression model.
What is the difference between cor() and cor.test() function
cor(): Directly computes the correlation coefficient(s) without providing statistical significance.
cor.test(): Conducts a hypothesis test to provide statistical significance (p-value), confidence intervals, and other summary information.
To plot a cor()/cor.test in R:
plot(data$xaxis, data$yaxis, xlab=’x-label’, ylab=’y-label’)
linear regression line of best fit using ggplot
ggplot(data=, aes(x=, y=)) +
geom_point()+
xlab(“”)+ ylab(“”)+
stat_smooth(method = “lm”, col = “blue”,se=F) linear regression line of best fit
Explain the concept of a 95% confidence interval and a 95% prediction interval
A 95% confidence interval (CI) provides a range of values within which we expect the true population parameter (e.g., mean, correlation coefficient) to fall 95% of the time, assuming repeated random sampling.
It is calculated based on the standard error and a critical value from the normal distribution (or other appropriate distribution).
A 95% prediction interval (PI) provides a range of values within which we expect an individual future observation to fall, given existing data.
It is wider than the confidence interval because it accounts for the variability in predicting individual data points.
state the difference between 95% confidence interval and a 95% prediction interval
Differences Between Confidence and Prediction Intervals:
1. Purpose:
o Confidence Interval: Estimates the range where the population parameter lies.
o Prediction Interval: Estimates the range where an individual future observation will fall.
2. Width of Interval:
o Confidence Interval: Narrower because it estimates a single parameter (like a mean).
o Prediction Interval: Wider because it must account for both the uncertainty in the parameter estimate and the natural variation of individual observations.
3. Applicability:
o Confidence Interval: Useful for making inferences about population parameters.
o Prediction Interval: Useful for predicting future outcomes based on existing data.
How to Calculate confidence intervals using R
Use the predict() function
interval = ‘confidence’)
Predict(fit, interval = ‘confidence’)
How to Calculate prediction intervals using R
We just change the command from interval=’confidence’ to interval=’prediction’
predict(fit,interval=’prediction’)
Calculate concentration at steady state (equation)
Concentration at steady state: Css= Infusion rate /Clearance
what is Css equivalent to?
Css= Infusion rate /Clearance = 1/clearance *infusion rate