Lecture 4 Flashcards

1
Q

Parameters

A

Variables learnt (found) during training e.g. weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Hyperparamters

A

Variables whose value is set before the training process begins (alpha and k (from k-NN))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Loss Function (or error)

A

Should be minimized for a single training example to achieve the objective; Part of a cost function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Cost Function

A

average of your loss functions over the entire training set (mean square error); A type of objective function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Objective Function

A

Any function that you optimize during training (maximum likelihood, divergence between classes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Gradient Descent

A

Finds optimal values for w;

Is an iterative optimization algorithm that operates over a loss landscape (cost function);

Follows the slope of the gradient W to reach the minimum cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Gradient of Cost Function

A

The direction of the steps to achieve the goal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Learning Rate

A

The size of steps took in any direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sigmoid Function (Logistic Function)

A

Assumes a particular functional form (a sigmoid) is applied to the linear function of the data; the output is a smooth and differentiable function of the inputs and the weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cross Entropy (Logarithmic Loss)

A

Predicts class probability compared to actual class for an output of 0 or 1;
The score calculated penalizes probability based on how far it is from actual value;
Penalty is logarithmic in nature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Regularization

A

Any modification to a learning algorithm that is intended to reduce its generalization error but not its training error;
Solves overfitting;
Used when we don’t have enough samples to create a good logistic regression classification model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Activation Functions

A

Applied on the hidden units;
achieve nonlinearity;
popular activation function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Feedforward

A

Calculates the predicted output (y); inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Backpropogation

A

Updating the weights and biases; learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Artificial Neural Networks

A
  • Highly expressive non-linear functions
  • Highly parallel network of logistic function units
  • minimizes sum of square training errors plus weight squared (regularization)
  • Uses gradient descent as training procedure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Is logistic regression used for classification or regression?

A

Logistic Regression is used for classification only.

17
Q

Does regularization work better with a small set of features (low dimensions) or a large set of features (high dimensions)?

A

Regularization works well with a large number of features.

18
Q

Is logistic regression used for binary or multiclass classification?

A

It can be used for both

19
Q

What are the steps for a gradient decent algorithm?

A
  1. Initialize random weights and bias
  2. Pass an input through the network and compute predicted values from output layer
  3. Calculate error between the actual value and the predicted value
  4. Go to each weights which contributes to the error and change its respective values to reduce the error
  5. Reiterate until you find the best weights of network