Impact of depth and design choice Flashcards
What does the Universal approximation theorem say?
A single hidden layer neural network with any “squashing” activation function and with a linear output unit can approximate any continuous function arbitrarily well, given enough hidden units.
In the worst case, an exponential number of hidden units (possibly
with one hidden unit corresponding to each input configuration
that needs to be distinguished) may be required. (2^n parameters)
What are the needs for Supervised training?
- labeled training set
- vector of model parameters
- loss function L(fθ(x), y)
- Training = find θ that minimizes the total loss on the training set
What are the needs for unsupervised training
- unlabeled training set
- vector of model parameters
- loss function L(fθ(x))
- Training = find θ that minimizes the total loss
What is a detection task? What output activation function is used?
Give an example
Only 2 possible classes : 0 and 1. The thing is either detected or not.
Sigmoid
CAPTCHA
What is a classification task? What output activation function is used?
Give an example
3 or more classes, similar to detection
Softmax activation function
What kind of animal is in the picture: leopard, egyptian cat, jaguar, ..
What is a regression task? What output activation function is used?
Give an example
the model is trained to learn the relationship between input variables and a continuous target variable.
Linear activation
estimating a house location, size, etc
Define what an autoencoder is
Neural network trained to predict its input
It is unsupervised learning
How does the autoencoder work?
Consists of two parts:
* an encoder function h = f (x)
* a decoder function xˆ = r(h) such that xˆ ≈ x
The hidden activations h provide a nonlinear representation of the input called an embedding.
Define what an undercomplete autoencoder is
Encoder where the embedding h
has fewer dimensions than the input x.
What is the perk of denosing the autoencoder
Forces the autoencoder to learn to undo the corruption, forcing it to learn saliant features
obtain embeddings h whose dimension is
bigger than the data x.
Explain Synthetic data generation
compute and draw samples from p(x).
For discrete data, treat as a series of classification tasks:
p(x) = p(x1) × p(x2|x1) × . . . p(xn|x1, . . . , xn−1).
What can you use Synthetic data generation for?
- Language modeling and text generation
- Augment existing data
- Testing and validaiton