Kursusgang 9 (Deep learning and transfer learning) Flashcards
What is deep learning?
A class of machine learning techniques that exploit many layers of non-linear information processing for feature extraction and transformation and for pattern analysis and classification.
Deep learning is capable of learning complex features in the data and it is able to handle large amounts of data, both labeled and unlabeled. Due to the ever-increasing computational power, it is now feasible with very large models e.g. large language models with a few trillion parameters.
What are the key models in neural networks?
Deep Neural Network (DNN)
Convolutional Neural Network (CNN)
Recurrent Neural Network (RNN)
Long Short-Term Memory (LSTM) RNNs
Generative Adversarial Networks (GANs)
Why did deep learning become popular in the early 2010’s?
Significantly more processing power, which allows higher model complexity and more adjustments to model structure.
We are in the big-data era, allowing interpolation rather than extrapolation.
Why use deep learning, when single hidden layer neural network are universal approximators?
Deep machines can represent more complex functions with less parameters and they can learn the same function with less training data, by reusing low-level feature detectors. Furthermore, features at higher layers more invariant (less sensitive to small shifts in the input) and discriminative than at lower layers.
True or false: The output layer of a deep neural network is typically a softmax function.
True, it guarantees that the output is a probability distribution.
What are the drawbacks of deep neural networks?
They do not explicitly exploit known structures (e.g. translational variability) in the input data.
Furthermore, they do not explicitly apply operations that reduces variability (e.g., pooling and aggregation).
Hyperparameters can be hard to determine
What makes convolutional neural networks unique?
They replace the matrix multiplication in normal neural networks with convolution to
* Explicitly exploit the data structure
* Automatically generalize across spatial translations of inputs
* Be applicable to any input that is laid out on a grid (1-D, 2-D, 3-D, and so on)
Furthermore, they use pooling to reduce variability.
What is pooling?
Divide the feature map into smaller regions: These regions are typically non-overlapping. Pooling summarizes each region which is done by applying a pooling function:
* Max pooling: Selects the maximum value within each region.
* Average pooling: Calculates the average value within each region.
How are convolutional neural networks specifically designed for classification?
Early layers learn simple patterns like edges or gradients. Deeper layers capture more abstract patterns like shapes or objects. They alternate between convolutional and pooling layers. Just before the ouput, stack a deep neural network on top for the classification based on the features extracted by the convolutional neural network.
What are the limitations of convolutional neural networks?
They mainly deal with translational variability. They cannot take advantage dependencies and correlations between samples (and labels) in a sequence.
What makes recurrent neural networks unique?
It can model sequential data in a natural way, which is important if memory is important for decision making.
Recurrent neural networks have an internal state, often called a “hidden state,” that stores information about past inputs. This memory allows them to process sequential data effectively.
Recurrent connections loop information from previous steps back into the current step, enabling the network to maintain context and capture dependencies between elements in a sequence.
What are the limitations of recurrent neural networks?
Simple recurrent neural networks are difficult to train due to diminishing and explosion of gradients over time and they have difficulty modeling long-range dependencies.
What is a long short-term memory recurrent neural network?
It is a specific type of recurrent neural network designed to specifically deal with the diminishing/exploding gradient problem in typical recurrent neural networks.
What is a generative adversarial network?
Generative Adversarial Networks are a type of deep learning architecture that can generate highly realistic synthetic data. They are based on pitting two neural networks against each other in a competitive game.
A generator network will try to produce as realistic as possible outputs, such as images, and a discriminator network will try to differentiate between real images from a dataset and the fake images from the generator network.
How is a generative adversial network trained?
Both the generator and discriminator networks are initialized with random weights.
Alternate Training Steps:
Train the discriminator:
Feed real data samples to the discriminator and train it to classify them as real.
Feed generated data samples from the generator to the discriminator and train it to classify them as fake.
Train the generator:
Generate new data samples.
Feed these generated samples to the discriminator.
Train the generator to “fool” the discriminator by maximizing the probability that the discriminator classifies the generated samples as real.
These training steps continue iteratively.