Week 8 - Linear models: regression Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

linear regression

A

Assumption that relationship between variables is linear
Normal distribution
Numerical response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

method of least squares

A

The line that fits the data dots best

Least squares regression line: line for which the sum of all squared deviations in Y is smallest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

forumla of a line

A

Y = a + bX

Slope: of a linear regression is the rate of change in Y per unit of X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

populations and samples

A

Regression line from sample should be the population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

predicted values

A

Predicted values: of Y from a regression line estimate the mean of Y for all individuals having a given value of X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

residuals

A

Residual: of a point is the different between its measured Y value and the Y value predicted by the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

standard error of a slope

A

Uncertainty associated with the sample estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

testing hypotheses about a slope

A

Evaluate if the population slope equals the null hypothesised value which is typically zero
T statistic is used with degrees of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

t-test for regression slope

A

Find the best slope - see if it is positive, negative or zero
Work our standard error of slope with mean square residual
Calculate t statistic and compare it to t distribution
See if p value is significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

F statistic or anova approach

A

F test used instead of t
Null hypothesis is slope is 0
Can be used when the test is two-sided and the null hypothesis slope is zero
Does not mean we are using ANOVA, just the ANOVA table for the F statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

using R squared

A

Use R square to measure fraction of variation in Y that is explained by X in the linear regression
If R squared is close to one, then X predicts most of the variation in Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

assumptions of regression

A

At each value of X there is a population of possible Y values whose mean lies on the true regression line
At each of the X values the distribution of Y values is normal
The variance of Y values is the same at all values of X
At each value of X the Y measurements represent a random sample from the population of possible Y values
Linear relationship
Residual plots and QQ plots check assumptions
Extreme residuals can violate variance assumption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what do we do about violations?

A

Ignore if not too drastic
Transform variables if necessary (log etc)
Transformations can be linearising
If there are 0s in data set, add 1 to all data points so they are not lost when transformed to log
Square root is good for Y that are counts
Arcsine square root is good for Y that are proportions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

transformations

A

Log transformation is easiest
Power and exponential relationships are also common

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

precision of predictions

A

Predicting the mean of Y or a data point of Y from X is a prediction
Mean predictions have higher precision than predicting a single data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

confidence bands

A

measure the precision of the predicted mean Y for each value of X.

17
Q

prediction intervals

A

measure the precision of the predicted single Y values for each X value.

18
Q

extrapolation

A

a prediction of the value of a response variable outside the range of X values in the data.