Chapter 2 Flashcards

1
Q

How to check the strength of a linear relationship using R

A

Using the cor() function and specifying method=’pearson’

Pearson’s correlation coefficient in R = cor(data$xaxis, data$yaxis, method=’pearson’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

State the FOUR assumptions when undertaking linear regression modelling.

A

Linearity: assume a linear relationship. The relationship between the dependant (response) variable and the independent (predictor) variable should be linear.

Independence: there should be independence between the observations.

Normality of residuals: error must be normally distributed.

Homoscedasticity: the variance of the residuals should be constant across any value of the predicted variable.

Better:

Linearity:
The relationship between the dependent (response) variable and the independent (predictor) variable(s) should be linear. This means that the expected change in the response variable should correspond proportionally to changes in the predictor(s), forming a straight-line relationship.
Independence:
Observations of the response variable should be independent of each other. In other words, the residuals should not be correlated with each other, which ensures that each observation provides unique information.
Normality of Residuals:
The residuals (errors) should be normally distributed around zero. This is important for statistical inference, ensuring accurate confidence intervals and hypothesis testing.
Homoscedasticity:
The residuals should exhibit constant variance across all predicted values of the response variable. In other words, the scatter of residuals should remain approximately the same for all fitted values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to check the strength of a linear relationship using R and P-value

A

cor.test
Using cor.test will give a p-value: cor.test(data$xaxis,data$yaxis, method=’pearson’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

equation for a straight line

A

Y= mx + c + (e - residual error)

m is the gradient (slope) of the line

c is the intercept on the y axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define the terms: y = β0 + β1 ∗ x + ε

A

y = Dependent/response variable
β0 = Intercept
β1 = Slope of the gradient
x = Independent/predictor variable
ε = Error left over (residual error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define residual error in the context of linear regression (1 mark)

A

In the context of linear regression, residual error refers to the difference between the observed value of the dependent variable and the value predicted by the linear regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between cor() and cor.test() function

A

cor(): Directly computes the correlation coefficient(s) without providing statistical significance.
cor.test(): Conducts a hypothesis test to provide statistical significance (p-value), confidence intervals, and other summary information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

To plot a cor()/cor.test in R:

A

plot(data$xaxis, data$yaxis, xlab=’x-label’, ylab=’y-label’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

linear regression line of best fit using ggplot

A

ggplot(data=, aes(x=, y=)) +
geom_point()+
xlab(“”)+ ylab(“”)+
stat_smooth(method = “lm”, col = “blue”,se=F)  linear regression line of best fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain the concept of a 95% confidence interval and a 95% prediction interval

A

A 95% confidence interval (CI) provides a range of values within which we expect the true population parameter (e.g., mean, correlation coefficient) to fall 95% of the time, assuming repeated random sampling.
It is calculated based on the standard error and a critical value from the normal distribution (or other appropriate distribution).

A 95% prediction interval (PI) provides a range of values within which we expect an individual future observation to fall, given existing data.
It is wider than the confidence interval because it accounts for the variability in predicting individual data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

state the difference between 95% confidence interval and a 95% prediction interval

A

Differences Between Confidence and Prediction Intervals:
1. Purpose:
o Confidence Interval: Estimates the range where the population parameter lies.
o Prediction Interval: Estimates the range where an individual future observation will fall.
2. Width of Interval:
o Confidence Interval: Narrower because it estimates a single parameter (like a mean).
o Prediction Interval: Wider because it must account for both the uncertainty in the parameter estimate and the natural variation of individual observations.
3. Applicability:
o Confidence Interval: Useful for making inferences about population parameters.
o Prediction Interval: Useful for predicting future outcomes based on existing data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to Calculate confidence intervals using R

A

Use the predict() function
interval = ‘confidence’)

Predict(fit, interval = ‘confidence’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to Calculate prediction intervals using R

A

We just change the command from interval=’confidence’ to interval=’prediction’

predict(fit,interval=’prediction’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Calculate concentration at steady state (equation)

A

Concentration at steady state: Css= Infusion rate /Clearance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is Css equivalent to?

A

Css= Infusion rate /Clearance = 1/clearance *infusion rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is 1/clearance equivilant to?

A

1/Clearance is equivalent to β1 and so we can see that clearance = 1/β1

17
Q

How to clear objects in the R global environment?

A

rm(list=ls()) <this will remove all objects
rm()

18
Q

Equation for infusion rate

A

Infusion rate (mg/l) = CSS * Clearance

19
Q

How to examine distribution in R

A

Basic R plot:

hist ()

20
Q

What is the runif() function?

A

The runif() function in R generates random numbers following a uniform distribution. A uniform distribution means that every value within a specified range has an equal probability of being chosen.

R can sample from a uniform distribution using the function runif(). n= observation size
give the instructions runif(n=, min=,max=)

*n: The number of random numbers to generate.
*min: The minimum value in the range (default is 0).
*max: The maximum value in the range (default is 1).

21
Q

What is the rnorm() function?

A

rnorm(): rnorm(n= , mean= , sd=)
Purpose: Generates random numbers from a normal (Gaussian) distribution.
Distribution Characteristics:
Normal Distribution: Bell-shaped curve characterized by the mean and standard deviation.
Parameters:
n (number of values to generate)
mean (mean of the distribution, default is 0)
sd (standard deviation, default is 1)

22
Q

runif (). versus rnorm ()

A

runif(): Generates numbers uniformly distributed across the specified range.
rnorm(): Generates numbers normally distributed around a specified mean with a specified standard deviation.

runif () <- for uniform distribution

runif():
Purpose: Generates random numbers from a uniform distribution.
Distribution Characteristics:
Uniform Distribution: Each value within a specified range is equally likely.
Parameters:
n (number of values to generate)
min (lower bound, default is 0)
max (upper bound, default is 1)

rnorm () <- for normal distribution

rnorm():
Purpose: Generates random numbers from a normal (Gaussian) distribution.
Distribution Characteristics:
Normal Distribution: Bell-shaped curve characterized by the mean and standard deviation.
Parameters:
n (number of values to generate)
mean (mean of the distribution, default is 0)
sd (standard deviation, default is 1)

23
Q

How to save a data frame as a csv file

A

write.csv ()

write.csv(data, “data.csv”, row.names = F)

24
Q

Explain the difference between confidence interval and a prediction interval in the context of linear regression analysis in R (3 marks)

A

CIs are used to indicate the reliability of an estimate of the parameter. For example, the CI around a regression coefficient provides a range that, with a specified probability, contains the true value

A prediction interval provides a range within which future observations are expected to fall, with a specified level of confidence. Unlike the CI, which focuses on the mean response, the PI accounts for the additional variability around individual predicted values.
PIs are used when predicting new outcomes from the regression model, reflecting both the uncertainty in estimating the model parameters and the natural random variability of individual outcomes.