Chapter 3 Getting started with neural networks Flashcards
relu function vs. sigmoid
relu (rectified linear unit) zeroes out negative values, sigmoid “squashes” arbitrary values into the [0, 1] interval
Advantages of larger layers?
smaller layers can act as information bottlenecks permanently dropping important information that other layers won’t have access to.
softmax activation
network will output a probability distribution over the different output classes that sums to 1
Categorical_crossentropy
loss function which measures the distance between two probability distributions.
network overfitting
when the network starts to get trained to specific features of a data set rather than learning overall trends
what loss function should you use for single-label, multiclass classification problems?
categorical crossentropy
feature-wise normalization
best practice for data pre-processing. For each feature in the input data you subtract by the mean and divide by the standard deviation
amount of data and overfitting
the less data you have, the worse overfitting is
what is one way to mitigate overfitting?
use smaller networks
widely used loss function for regression problems?
Mean Squared Error
Mean Squared Error
loss function: the square of difference between the predictions and the targets
Mean Absolute Error (MAE)
metric for monitoring model performance in regression problems