Chapter 3- Linear Models Flashcards

Question 1

Q

what is a decision stump

Answer

A

single feature

threshold required to switch decision from 0 to 1 is parameter t

Question 2

Q

what is the decision boundary in a decision stump

Answer

A

the point at which the decision switches,

the threshold, t

Question 3

Q

what is the learning algorithm for a decision stump

Answer

A

for t varied between min(x) and max(x):
count errors
if errors is less than minErr, set as minErr and t

Question 4

Q

what does linearly separable mean?

Answer

A

we can fit a linear model (i.e. draw a linear decision boundary) and perfectly separate the classes

Question 5

Q

what is the limitation of a decision stump?

Answer

A

it works only on a single feature

Question 6

Q

what is the discriminant function f(x)=?

Answer

A

(sum for all features: wjxj) - t
or in matrix notation:
wTx - t

Question 7

Q

what does the discriminant function describe, geometrically?

Answer

A

the equation of a plane

Question 8

Q

what is the gradient and y intercept of the decision boundary from the discriminant function in two dimensions?

Answer

A

set equal to zero
m = -(w1/w2)
c = t/w2

Question 9

Q

what is the perceptron decision rule?

Answer

A

if f(x) > 0 then yhat=1 else 0

Question 10

Q

what is the perceptron parameter update rule, with sigmoid error?

Answer

A

wj = wj - (lrate)(yhat - y)(xj)

Question 11

Q

what is the perceptron learning algorithm?

Answer

A

for each training sample:
update weight: wj = wj - (lrate)(yhat - y)(xj)
t = t + lrate(yhat-y)
until changes to parameters are zero

Question 12

Q

what is learning rate?

Answer

A

the step size of the update

Question 13

Q

what is the limitation of the perceptron algorithm?

Answer

A

can only solve linearly separable problems

Question 14

Q

if …. the perceptron algorithm is guaranteed to solve the problem

Answer

A

the data is linearly separable

Question 15

Q

what is the perceptron convergence theorem?

Answer

A

If a dataset is linearly separable, the perceptron learning algorithm will converge to a perfect classification within a finite number of training steps

Question 16

Q

a logistic regression model has the output f(x) = ?

Answer

A

1 / 1+e^-z

where z is wT - t

Question 17

Q

what is the name of the function that logistic regression uses?

Question 18

Q

what is the decision rule for logistic regression?

Answer

A

if f(x) >0.5 then 1 else 0

Question 19

Q

what is loss?

Answer

A

the cost incurred by a model for a prediction it makes

Question 20

Q

what loss function does logistic regression use?

Answer

A

log loss, or cross-entropy

Question 21

Q

what is the equation for log loss (cross entropy), L(f(x),y) = ?

Answer

A

L(f(x),y) = -{ylogf(x) + (1-y)log(1-f(x))}

Question 22

Q

what is an error function?

Answer

A

when the loss function is summed or averaged over all data points

Question 23

Q

what is the error function (summed log loss) for logistic regression E=?

Answer

A

(sum for each i) {yilog(f(xi)) + (1-yi)log(1-f(xi))}

Question 24

Q

what are the names the error function for logistic regression is known by?

Answer

A

cross entropy error

negative log likelihood

Question 25

Q

what is the rule of gradient descent, in words?

Answer

A

in order to decrease error, we should update parameters in the direction of the negative gradient

Question 26

Q

what is the partial derivative of the cross entropy error function with the logistic regression model, with respect to parameter wj, dE / dwj = ?

Answer

A

dE/df(x) x df(x)/dz x dz/dwj

= sum of i: (f(xi) - yi)xij

Question 27

Q

what is the algorithm for gradient descent?

Answer

A

repeat:
for each parameter j do
wj = wj - lrate x dE/dwj
until termination criteria met

Question 28

Q

what is stochastic gradient descent?

Answer

A

compute the gradient for each example one by one and modify the parameters for each

Question 29

Q

why is stochastic gradient descent often applied?

Answer

A

it works well more effectively in very large datasets

Question 30

Q

what is the algorithm for logistic regression?

Answer

A

t = random
w = random vector
set max epochs
lrate = 0.1

for each epoch:
    for each training example x:
        for each parameter j
            wj = wj - lrate(f(x)-y)(xj)
    t = t + lrate(f(x)-y)

Question 31

Q

what loss function does the perceptron use?

Answer

A

hinge loss

Question 32

Q

give the equation for hinge loss

Answer

A

sum: -y(wx + b)
= sum: -y(yhat)
sum all the negative values for ONLY the misclassified samples

Question 33

Q

stochastic gradient descent is also known as

Answer

A

mini batch

Question 34

Q

gradient based optimisation is possible when the loss function is

Answer

A

differentiable

Question 35

Q

what are the 4 steps of gradient based minimisation?

Answer

A

test for convergence
compute search direction
compute step length
update the variables

Question 36

Q

when we perform minibatch sgd, what do we times sum:dL/dW by to scale it

Answer

A

n / |S|

n samples / batch size

Question 37

Q

modern machine learning has given rise to what kind of programming

Answer

A

differentiable programming

Question 38

Q

what is differentiable programming

Answer

A

If the performance of a computer program can be represented by a loss function, we could seek to optimise that program via its parameters using a gradient based approach

Question 39

Q

the perceptron algorithm is a … … classification algorithm

Answer

A

deterministic

binary

Question 40

Q

what is a generative process

Answer

A

describes the way in which data is generated

Question 41

Q

what is the perceptron weight update, with hinge loss?

Answer

A

wj = wj - (lrate)( - yhat x y)(xj)
or if just for the misclassified
wj = wj - (lrate)( -y)(xj)
= wj + (lrate)(y)(xj)

Question 42

Q

we make the iid assumption for logistic regression, this is that

Answer

A

our data are independent and identically distributed (iid).

Question 43

Q

the iid assumption means that the outputs …

Answer

A

The outputs do not depend on multiple inputs nor on other outputs.

Question 44

Q

the iid assumption we make for logistic regression means

Answer

A

we can perform maximum likelihood estimation

i.e. we can work out the best parameters from the data by maximising
p( W | D) = multiply:p(y | x, w)

Question 45

Q

what is the loss function (negative log-likelihood) for SGD for logistic regression

Answer

A

1/n sumi->n:[yi log f(xi) + (1-yi) log (1-f(xi))]

same but with 1/n to rescale based on sample size

Question 46

Q

we can use logistic regression to work out p(y=1 | x, w) =

Answer

A

f(x) = 1 / (1 + e^-z)

Question 47

Q

the decision boundary for logistic regression is given by

Answer

A

d = 1 / (1+e^-z)

wx + b = log(d / 1-d)

Question 48

Q

what are the 3 data properties that will cause practical challenges for a logistic regression model

Answer

A

imbalanced data - anything using MLE will try to fit the dominant class

multicollinearity - two or more predictor variables are highly linearly related.

completely separated training data

Question 49

Q

what step can we take to minimise the impact of multicollinearity in logistic regression

Answer

A

feature selection

Question 50

Q

benefits of logistic regression (5)

Answer

A

Efficient and straightforward,
- Doesn’t require large computation,
- Easy to implement, easily interpretable
- Used widely by data analyst and scientist.
- Provides a probability for predictions and observations.

Question 51

Q

limitations of logistic regression (2 general, 3 data properties)

Answer

A

Linear decision boundaries
Inability to handle complex inputs (e.g. an image)
Multicollinearity (correlated inputs)
Sparseness (lots of zero or identical inputs)
Complete separation (it is not a probabilistic problem!)

Question 52

Q

limitations of perceptron (4)

Answer

A

Challenges with high dimensional multiple correlated input features

linear

Convergence can be tricky depending on the variant of perceptron used

deterministic

Question 53

Q

which algorithm: perceptron or logistic regression, doesnt converge

Answer

A

logistic regression

Question 54

Q

why does logistic regression never converge

Answer

A

we can never reach the true decision boundary. We are trying to fit an s form to a straight boundary. Eventually we get w1=inf. This is the closest we will get.

Question 55

Q

what property of logarithm means we can take the log of the likelihood

Answer

A

logarithm is a monotonically increasing function.

It doesnt affect where out max/min is

Question 56

Q

what is a monotonically increasing function, what does it mean?

Answer

A

if the value on the x-axis increases, the value on the y-axis also increases