Chapter 3- Linear Models Flashcards
what is a decision stump
single feature
threshold required to switch decision from 0 to 1 is parameter t
what is the decision boundary in a decision stump
the point at which the decision switches,
the threshold, t
what is the learning algorithm for a decision stump
for t varied between min(x) and max(x):
count errors
if errors is less than minErr, set as minErr and t
what does linearly separable mean?
we can fit a linear model (i.e. draw a linear decision boundary) and perfectly separate the classes
what is the limitation of a decision stump?
it works only on a single feature
what is the discriminant function f(x)=?
(sum for all features: wjxj) - t
or in matrix notation:
wTx - t
what does the discriminant function describe, geometrically?
the equation of a plane
what is the gradient and y intercept of the decision boundary from the discriminant function in two dimensions?
set equal to zero
m = -(w1/w2)
c = t/w2
what is the perceptron decision rule?
if f(x) > 0 then yhat=1 else 0
what is the perceptron parameter update rule, with sigmoid error?
wj = wj - (lrate)(yhat - y)(xj)
what is the perceptron learning algorithm?
for each training sample:
update weight: wj = wj - (lrate)(yhat - y)(xj)
t = t + lrate(yhat-y)
until changes to parameters are zero
what is learning rate?
the step size of the update
what is the limitation of the perceptron algorithm?
can only solve linearly separable problems
if …. the perceptron algorithm is guaranteed to solve the problem
the data is linearly separable
what is the perceptron convergence theorem?
If a dataset is linearly separable, the perceptron learning algorithm will converge to a perfect classification within a finite number of training steps
a logistic regression model has the output f(x) = ?
1 / 1+e^-z
where z is wT - t
what is the name of the function that logistic regression uses?
sigmoid
what is the decision rule for logistic regression?
if f(x) >0.5 then 1 else 0
what is loss?
the cost incurred by a model for a prediction it makes
what loss function does logistic regression use?
log loss, or cross-entropy
what is the equation for log loss (cross entropy), L(f(x),y) = ?
L(f(x),y) = -{ylogf(x) + (1-y)log(1-f(x))}
what is an error function?
when the loss function is summed or averaged over all data points