GLM 1 - Simple linear regression Flashcards by Naomi L

In a linear relationship, how is the value of an outcome variable Y approximated?

Y ≈ β0 + β1X.

Y= dependent variable
B0= is an intercept
B1 = slope coefficient of X

How well did you know this?

Not at all

Perfectly

What is the intercept/B0 (often labelled the constant)?

Expected mean value of Y when all X=0.

How well did you know this?

Not at all

Perfectly

What is the B1?

The slope or how y changes per unit increase in x.

B1 is increase in y when you change x by a unit/when x is increased by one unit y will increase by beta 1

How well did you know this?

Not at all

Perfectly

What is the terminology of a linear regression?

We say that Y is regressed on X .
We are expressing Y in terms of X .
The dependent variable, Y , depends on X .
The independent variable, X , doesn’t depend on anything.

How well did you know this?

Not at all

Perfectly

How are the coefficients or parameters B0 and B1 estimated?

Using the available data:
(x1, y1), (x2, y2), . . . , (xn, yn ) - We have here a sample size of n data points.

How well did you know this?

Not at all

Perfectly

How are the estimates of parameters written?

The estimates of the parameters are written with a circumflex or hat: ^

We then write our linear equation with these estimated coefficients: y^ = β^0 + β^1 xi

Only a hat over the dependent variable.

Independent variable (xi) does not have a hat as treated as fixed.

How well did you know this?

Not at all

Perfectly

B0 and B1 are independent of each other

True or false

True

How well did you know this?

Not at all

Perfectly

What does the circumflex allow us to differentiate between?

True value and estimated value

How well did you know this?

Not at all

Perfectly

What happens if add value to B0?

This would only affect y but not B1xi – B0 can change independently of B1

How well did you know this?

Not at all

Perfectly

What is y^I?

Predictions or predicted values of the outcomes y , given the independent variables, xi ’s

How well did you know this?

Not at all

Perfectly

What are the differences between the predicted values, y^ i’s, and the observed values, yi ’s?

The residuals:
e^ := yj − yi^ .

That is, these are the values that remain after we have removed the
predictions from the observations.

How well did you know this?

Not at all

Perfectly

Why are the residuals, e^i ’s, also equipped with a hat?

Because these are also estimated values.

How well did you know this?

Not at all

Perfectly

Why are the black error bars vertical, and not perpendicular to the line in blue?

Residuals correspond to an addition to value of y hat

How well did you know this?

Not at all

Perfectly

How can the optimal value of the parameters, β0 and β1 be found?

By considering the sum of the squares of the residuals:

RSS := e^1 + e^2

How well did you know this?

Not at all

Perfectly

Why do we square residuals?

Residuals are defined as a subtraction of the predicted values from observed values; we can rewrite RSS in the following fashion: RSS = (y − y^1)2. Some values may be negative and some may be positive and thus must square them to normalise them and ensure they make a positive contribute to RSS.

How well did you know this?

Not at all

Perfectly

What is the optimal purpose for B0 and B1?

Study These Flashcards

To minimise distance from all the data points.

What is RSS a function of and why?

Study These Flashcards

B0 and B1 because all residuals do depend upon values of B0 and B1. Thus, we may write the RSS as depending on these quantities:

RSS(β^0, β^1) = e^12 + e^22

The value taken by the RSS can therefore minimized for some values of β^0 and β^1.

How do we write this?

Study These Flashcards

(β0, β1 ) := argmin RSS(β0 , β1),

Argim RSS- means argument that minimizes RSS

Where the hats on the right hand-side of the RSS have been suppressed.

The RSS is a function of the parameters β0 and β1 therefore…

Study These Flashcards

it can take a range of values across a two dimensional landscape

How can we assess the accuracy/goodness of fit of our model?

Study These Flashcards

Using the previously minimized value of the RSS

What is one way of quantifying accuracy of model?

Study These Flashcards

Compare RSS with total sum of squares (which can be reformulated as the sum of squares of null model as null model is model with only y intercept)

What is R2 also known as?

Study These Flashcards

Coefficient of determination

What does R2 measure?

Study These Flashcards

Proportion of variance in the dependent variable explained by the independent variable.

For simple regression, the R2 can be shown to be equivalent to what?

Study These Flashcards

Correlation of the IV with the DV.That is,where R2 and the square of Cor(Y , X ) are equal.

What is a random variable?

A function, from a sample space Ω to the real numbers, R, such that X : Ω ›→ R. Uppercase X is random variable

For every point in the sample space, ω ∈ Ω, the random variable X may (or may not) take what?

A different value, such that we have: X (ω) = x. We call x , the realization of X(random variable) at ω(point in sample space)

What does the probability to obtain x , count?

The number of ω producing x , written as P[X = x ] := P[{ω ∈ Ω : X (ω) = x }].

What is the random variable for the toss of an even coin, with head, H, and tail, T?

X : {H, T } ›→ {0, 1}, with X (H) = 0 and X (T ) = 1, producing the probabilities In other words, X has H and T as a sample space and X is going to assign to those events 0 for heads and 1 for tails

If we have a single-faced coin, Y : {H, T } ›→ {0, 1}, such that Y (H) = 0, and Y (T ) = 1, what are the probabilities?

P[Y = 0] = 1, and P[Y = 1] = 0 The measure P is used to give probability mass to each element in Ω.

What is the discrete expectation?

For a discrete value y, the expectation E[Y] is the sum of the value of y’s(the realizations- all the possible values taken by y over values in sample space). Finite number of values a y as this is for a discrete random variable. Weight each values by probability of obtaining those values. This is almost identical to arithmetic mean.

What is the Arithmetic mean ?

A special case of expectation in which p of y are uniform across all values of possible values of y(1/n).

In simple regression what are we given?

Two sequences of data points. Each pair of observations is a case, (yi , xi ), with i = 1, . . . , n.

What is the deterministic and stochastic part of a statistical model?

yi = β0 + β1xi + εi B0 + B1xi = deterministic ei = stochastic

What is one difference between regression and correlation?

In regression one of the variables is treated as the outcome variable or dependent variable, generally denoted by the yi ’s We will then use the other variables for predicting that outcome. As a result, the other variables are referred to as predictors, or independent variables, and are denoted by xi ’s, or features in the machine learning literature.

What is the deterministic part of a univariate simple linear regression made up of?

1. Mean expressed as a conditional expectation: E[Y |X = xi ] = β0 + β1xi , 2. Variance function, expressed as a conditional variance operator: Var[Y |X = xi ] = σ2, ∀i = 1, . . . , n. The (unknown) parameters in this model are therefore (β0, β1, σ2).

What are the unknown parameters in the deterministic part of a simple linear regression?

1. β0 is the y -intercept of E[Y |X ], when X = 0. Thus, we have E[Y |X = 0] = β0. 2. β1 is the rate of change of E[Y |X ], such that E[Y |X = x + 1] − E[Y |X = x ] = β1. 3. σ2 is the (conditional) variance of Y , given X . It is strictly positive, σ2 > 0.

What does the stochastic part of a simple linear regression?

Random noise- In general, the observables or observed data, denoted by yi ’s, differ from the expected values of Y , given xi , such that yi = E[Y |X = xi ] + εi , i = 1, . . . , n, where the εi ’s are the statistical errors, collectively referred to as additive noise. The εi ’s are defined as the difference between the observables and the conditional expected values –that is, εi = yi − E[Y |X = xi ]. Geometrically, the errors correspond to the vertical distances between each yi and its conditional expectation. Note that the error terms are not observable, since they depend on the unknown parameters (β0, β1).

GLM 1 - Simple linear regression Flashcards

(37 cards)