Chapter 6 Flashcards

Question 1

Q

Logistic Sigmoid Function

Answer

A

f(x) = 1 / (1 + e^(-x))

where x is the input to the function. The output of the sigmoid function is always between 0 and 1, which can be interpreted as a probability.

Question 2

Q

Logit

Answer

A

weighted sum z, which is the input to the logistic function.

Question 3

Q

Softplus Function

Answer

A

f(x) = log(1 + e^x)

Question 4

Q

Maxout Function

Answer

A

f(x) = max(w_1 * x + b_1, w_2 * x + b_2)

where x is the input to the function, w_1 and w_2 are learnable parameters, b_1 and b_2 are biases, and max is the element-wise maximum operator. The maxout function takes the maximum value between two linear combinations of the input, which allows the model to learn piecewise linear functions.

Question 5

Q

Softmax Function

Answer

A

softmax(x_i) = e^(x_i) / sum(e^(x_j))

where x_i is the logit for class i, e is the base of the natural logarithm, and the sum is taken over all classes j. The softmax function exponentiates the logits and normalizes them such that they sum to 1, which produces a valid probability distribution.

Question 6

Q

Linear Output Unit

Answer

A

used to predict a value not limited to the range 0 to 1.
The linear neuron uses the identity function as an activation function; that is, its output is the weighted sum of the inputs.

Question 7

Q

Boston Housing Dataset

Answer

A

a well-known dataset in machine learning and statistics used for regression analysis. It contains information about housing in the suburbs of Boston, Massachusetts, including features such as crime rate, number of rooms, and pupil-teacher ratio. The goal of the dataset is to predict the median value of owner-occupied homes in each suburb based on the other features. The dataset is often used as a benchmark for regression models, and it is widely available for use in research and education.

Question 8

Q

Weight Decay

Answer

A

a regularization technique used in machine learning and neural networks to prevent overfitting. It works by adding a penalty term to the loss function of the model that encourages the weights of the model to be small. The penalty term is proportional to the sum of the squares of the weights of the model, and it is controlled by a hyperparameter called the weight decay coefficient or regularization strength.

Chapter 6 Flashcards

(8 cards)