All Flashcards

1
Q

What type of data is there?

A

Unstructured data, such as images, videos, audio and text, are often known categorized as qualitative data. It cannot be processed or analyzed using conventional data tools and methods.

Structured data, e.g. tensors or tables, storing n-dimensional data,

Semistructured data, such as JSON files.

Metadata, data that describes other data. Such as variable names.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is data augmentation?

A

A technique to generate more samples. E.g. flipping, rescaling, rotation, thermofilter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is data reduction?

A

remove noisy data to improve the model, better accuracy. Reduces complexity of the model and trains faster.
Feature selection, remove unimportant features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an autoencoder?

A

Purpose:
Autoencoders are primarily used for unsupervised learning tasks, such as data compression, dimensionality reduction, anomaly detection, and feature learning.
The main goal of an autoencoder is to learn a compact, efficient representation (encoding) of input data and then reconstruct the input data from this encoding as accurately as possible.
Use Cases:
Dimensionality reduction (similar to PCA, but nonlinear).
Data denoising (denoising autoencoders).
Anomaly detection (reconstruction error highlights anomalies).
Generative modeling (e.g., speech to text).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an transformer?

A

Purpose:

Transformers are designed for handling sequential data and are particularly powerful for tasks involving natural language processing (NLP), such as machine translation, text summarization, and language modeling.
They use self-attention mechanisms to model relationships between all elements in a sequence simultaneously, rather than relying on sequential processing like RNNs.
Use Cases:
Language translation (e.g., Google Translate).
Text generation (e.g., GPT models).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the core things in a transformer?

A
  • Encoder-decoder stack
  • Self-attention mechanism
  • Positional Encoding
  • Feed-forward network
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the criteria (needed for working model) to model sequences?

A
  1. Handle variable length sequences.
  2. Track long term dependencies.
  3. Maintain information about order.
  4. Share parameters across the sequence.

Needed to generate new possible outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are benefits of base models in computer vision?

A

Benefits of base models in computer vision:
Improves overall accuracy of the model. This is because a base model is trained on thousands or millions of images even. This means that the feature extraction part of the CNN model is well trained. All you would have to do after is retrain the model on the new subset of images that you want and finetune the last layer (or perhaps last 2). Because if you do not have that much data, and would only train on the subset you have, there is a big chance that the model would be overfitted to your data. With these comes some obvious benefits. Reduced training time (takes alot of time to train a new model), works well with fewer samples, you can add classes to an already existing classifier, less power consumption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Difference between Discriminative and generative models?

A

Generative: Given the distribution, capture the correlations.
Discriminative: Learn differences, ignore correlations (divide data with line/plane)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are GRUs and LSTMs?

A

Both of these algorithms are sort of RNNS but with the exception that they keep more of the context. They are good at forecasting timeseries or sequence modeling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Compare LSTM and Transformers.

A

Lstm process input sequentially, carrying forward hidden states (longterm / shortterm memory) to capture information from previous steps. They have a recurrent structure in which the output is fed back into the algorithm. It also has special gates to mitigate the problem of vanishing/exploding gradient.
Transformers has parallel processing rather than sequential. The self attention mechanism allow year input (eg word) to attent to other words in the sequence, regardless of the position in the sequence. The self attention mechanism itself is what makes a transformers ability to capture long term relationships. Each token in a sequence calculates the attention with every other token, allowing the model to weighing its importance. Since transformers process inputs in parallel, positional encoding is added to the input embeddings to give the model a sense of position within the sequence.
Transformers are often better because they are: More effective at handle long range dependencies, more efficient with parallelism, rich representation thanks to attention.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe vanishing and exploding gradient.

A

Vanishing gradient is when the gradient of the loss function is small relative to the model parameters. It causes the weights to update very little at the time, resulting in slow learning. Because of the functionality of backpropagation, with the chain rule of derivatives, this update of weights become smaller and smaller as we get towards the earlier layers. Usually becomes a problem when the algorithm utilizes activation functions like sigmoid or tanh. Since they squish the number between 0-1, it leads to a gradient being exponentially smaller. A solution could be to use relu instead.
Exploding gradient is instead when the gradient of the loss function is large relative to the parameters, resulting in instability in the learning process. Poor initialization of weights can lead to large values being passed through the layers. If they gradients are large, the product can quickly escalate when they are propagated backwards. Solutions could be better initializations of the weights and gradient clipping.

A solution for both could be batchnormalization. It normalizes the output from a previous activation layer by subtracting the batch mean and dividing by the batch standard deviation. After the normalization, the data is scaled and shifted using learnable parameters. By Normalizing the input to each layer, it helps to maintain a stable gradient flow, especially in deep networks. Mostly a problem for RNNs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DL vs ML?

A

Traditional ML involves feature engineering. Good extraction of relevant features etc, the better the algorithm is. Deep learning on the other hand can have hundreds of successive layers of representation. Thus, deep learning completely automates feature engineering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Denoising autoencoder vs a normal one?

A

A Denoising Autoencoder (DAE) is a type of autoencoder designed specifically to handle noisy data by learning to reconstruct a clean output from a corrupted input. It extends the basic concept of a regular autoencoder by introducing noise to the input data during the training process and requiring the model to learn how to remove or “denoise” this noise.

Normal autoencoder:
Tries to, as accurately and efficiently as possible, to learn a representation (encoding) of the input data. This can include a translator. Not translating word for word but to accurately capture the meaning (representation) of the sentence and decode it, in this example that means the language translated to.
Denoising autoencoder:
Denoising autoencoder learns robust representation of the original, clean data from a corrupted version of the input.This makes the model more resilient to noise and can help the model generalize. It learns to remove the noise. In comparison to the normal autoencoder, which is more affected by noise in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the bottleneck layer of an autoencoder?

A

The bottleneck layer forces the model to compress the input to a lower dimensional representation. This compression of the data reduces the amount of information the network can use to reconstruct the input. Thus, compelling the model to learn the most important and informative features. It basically acts as a feature extractor. It helps the model to not only remember the input as it is, but encourages it to generalize by focusing on patterns and features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe the encoder in a transformer.

A

The encoder in a Transformer processes the input sequence (e.g., a sentence in English) and generates a set of hidden states or representations.
It consists of multiple layers, a self attention mechanism and a feed forward neural network. The NN process the output of the sel attention layer (like normal NN). The output is a vector that capture meaning and structure.

17
Q

What is the decoder in a transformer?

A

The decoder also consists of multiple layers, each with 3 sublayers. Self attention, focusing on the relevant part of the last sequence. Another self attention, focusing on relevant parts, as processed by the encoder. And, a feed forward neural net. This generates the outout sequence. One token at a time.

18
Q

What is positional encoding?

A

Positional Encoding is used in Transformers to give the model information about the order of the tokens in the sequence since the model itself treats the input as a set, not a sequence.

19
Q

Describe the self attention mechanism.

A

Allows the model to focus on different parts of the input sequence to understand context. Each word in a sequence is embedded and represented as a vector. For each word or token in the sequence, the model computes 3 new vectors: Query, Key and Value. q represents focus of question, k represents tags or identifier (identify similar words) and value is the actual word that is passed forward.

20
Q

What are GANs (generative adversial networks)

A

Generative Adversarial Networks (GANs) are a class of machine learning models designed for generating new data samples that are similar to a given dataset.
GANs consist of two neural networks that are trained simultaneously through a process of adversarial competition.

NN1: Generator
Takes in random noise and transforms it to a data sample that resembles the training data, e.g. an image. The goal of the generator is to create data that is indistinguishable from the real data by the discriminator

Discriminator:
he discriminator network takes in a data sample and tries to classify it as either real (from the training dataset) or fake (generated by the generator).
The discriminator is essentially a binary classifier that outputs a probability that a given input is real.

The two networks work a zero sum game, where they try to fool eachother. The generator becomes better at generating realistic data and the discriminator becomes better at telling real from wrong.

21
Q

what is GNN (graph neural networks)

A

designed to work directly with graph-structured data. Unlike traditional neural networks that operate on regular grid-like data (such as images or sequences), GNNs are capable of processing data represented as graphs, which consist of nodes (vertices) and edges (connections between nodes).
Represents complex data such as social networks, molecules, knowledgegraphs.
Uses node representation (usually some sort of embedding) and propagations (message passing). Can have multiple layers, pooling and loss functions and training.

22
Q

what are the 4 main parts in a transformer?

A

Word embedding & positional encoding, encoder-decoder, self attention, feed forward neural network