Chapter 6 Flashcards

1
Q

Logistic Sigmoid Function

A

f(x) = 1 / (1 + e^(-x))

where x is the input to the function. The output of the sigmoid function is always between 0 and 1, which can be interpreted as a probability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Logit

A

weighted sum z, which is the input to the logistic function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Softplus Function

A

f(x) = log(1 + e^x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Maxout Function

A

f(x) = max(w_1 * x + b_1, w_2 * x + b_2)

where x is the input to the function, w_1 and w_2 are learnable parameters, b_1 and b_2 are biases, and max is the element-wise maximum operator. The maxout function takes the maximum value between two linear combinations of the input, which allows the model to learn piecewise linear functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Softmax Function

A

softmax(x_i) = e^(x_i) / sum(e^(x_j))

where x_i is the logit for class i, e is the base of the natural logarithm, and the sum is taken over all classes j. The softmax function exponentiates the logits and normalizes them such that they sum to 1, which produces a valid probability distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Linear Output Unit

A

used to predict a value not limited to the range 0 to 1.
The linear neuron uses the identity function as an activation function; that is, its output is the weighted sum of the inputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Boston Housing Dataset

A

a well-known dataset in machine learning and statistics used for regression analysis. It contains information about housing in the suburbs of Boston, Massachusetts, including features such as crime rate, number of rooms, and pupil-teacher ratio. The goal of the dataset is to predict the median value of owner-occupied homes in each suburb based on the other features. The dataset is often used as a benchmark for regression models, and it is widely available for use in research and education.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Weight Decay

A

a regularization technique used in machine learning and neural networks to prevent overfitting. It works by adding a penalty term to the loss function of the model that encourages the weights of the model to be small. The penalty term is proportional to the sum of the squares of the weights of the model, and it is controlled by a hyperparameter called the weight decay coefficient or regularization strength.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly