Deep learning Fundamentals Flashcards
What is deep learning?
“Deep learning is a subset of machine learning that uses artificial neural networks with multiple layers – hence ‘deep’ – to model and process complex patterns in data. It’s inspired by the structure and function of the human brain, though the analogy is not perfect.
Key characteristics of deep learning include:
Feature learning: Unlike traditional ML algorithms, deep learning models can automatically learn hierarchical representations of features from raw data, reducing the need for manual feature engineering.
Large data requirements: Deep learning typically requires vast amounts of data to train effectively, which aligns well with the big data era we’re in.
Computational intensity: Training deep neural networks is computationally expensive, which is why GPUs like those produced by NVIDIA have been crucial to the field’s advancement.
Versatility: Deep learning has shown remarkable performance across various domains, including computer vision, natural language processing, speech recognition, and game playing.
Types of networks: There are various architectures like Convolutional Neural Networks (CNNs) for image processing, Recurrent Neural Networks (RNNs) and Transformers for sequential data, and Generative Adversarial Networks (GANs) for generating new data.
Simple answer: deep learning is a ML technique that learns features and tasks directly from data
Can you explain what hidden layers are in a DL architecture?
What problem does deep learning solve that is found in traditional ML?
The problem with traditional ML algorithms is that no matter how complex they get, they will always be “machine like” and will always need a lot of domain expertise, human intervention, and are only capable of what they are designed for
“If I show you a picture of a cat you will automatically know its a cat but how would a machine know that its a cat?” - here you would have to define to a computer what a cat is along with all of its characteristics
What is a neural network?
NN’s form the basis of deep learning where the algorithms are inspired by the neurons of the human brain.
What is a neuron?
“In deep learning, a neuron, also called a node or unit, is the fundamental building block of artificial neural networks. It’s inspired by biological neurons but simplified for computational efficiency.
Key aspects of an artificial neuron include:
Inputs: A neuron receives multiple input signals, typically from other neurons or the input data.
Weights: Each input has an associated weight, representing its relative importance.
Bias: An additional parameter that allows the neuron to shift its activation function.
Summation function: Computes the weighted sum of inputs plus the bias.
Activation function: Applies a non-linear transformation to the sum, determining the neuron’s output.
How do neural networks actually learn?
This can be broken down into forward and backward propagation
forward = propagation of information from the input layer to the output layer which can be defined as several neurons
the neurons from the forward propagation connect to the hidden layers through “channels” which are then assigned numerical values called weights
backward = form of probability which means that the neuron with the highest value determines what the output finally is
In forward propagation can you explain what a weight is?
the “weight” tells us how important the neuron is. the higher the weight the higher the more important it is in the relationship
In forward propagation can you explain what a bias is?
is the neuron having an opinion to the relationship - it serves to shift the activation function to the right or to the left
What about backward propagation?
Opposite of forward propagation in that information from the output layer to the hidden layers not the input layer.
In the last step before propagation, the neural network spits out a prediction which can be “right” or “wrong”
The back propagation evaluates its own performance and checks if it is right or wrong -> if wrong, the neural network runs what is called a loss function which helps to quantify the deviation from the expected output and it is this information that is sent back to the hidden layers for the weights and biases to be adjusted so that the networks accuracy level increases
What are the main challenges in deploying large-scale deep learning models in a production environment, and how would you address them?
- Model size and latency: Using techniques like quantization, pruning, and knowledge distillation.
- Scalability: Implementing distributed inference using technologies like NVIDIA Triton.
- Version control and reproducibility: Using MLflow or similar tools.
- Monitoring and explainability: Implementing logging, metrics collection, and tools like SHAP for model interpretability.
- Cost management: Optimizing GPU utilization and leveraging spot instances.
Can you summarize how a deep learning models learning algorithm learns under the hood?
- The neural network initializes parameters with random values (weights and biases)
- We then take a set of input data and feed it to the neural network
- Compare the predicted value with expected value and calculate loss using the loss function
- Perform back propagation to propagate this loss back through the network to each and every weight and bias
- Use the propagated information to update the weights and biases with a gradient descent algorithm to reduce the total loss and a better model is obtained
- Last step is continue the previous steps till loss is minimized
Can you explain what a gradient descent algorithm is and how it’s used in neural networks?
Iterative algo that starts off at a random point on the loss function and travels down its slope in steps until it reaches the lowest point (minimum) of the function
What is an activation function?
introduces non-linearity in the network and decides whether a neuron can contribute to the next layer
What are the different type of activation functions?
Option 1 would be step functions where the function will activate the neuron if it is above a certain value or threshold if value > 0 activate else do not activate but the problem with this approach is that if you divide up your neurons into classes then you don’t want all classes to fire
Option 2: linear function
Option 3: Sigmoid Function
Option 4: Gradient Descent
Option 5: Tanh Function
Option 6: ReLU Function (rectified linear unit)
Option 7: Leaky ReLU Function
How do you decide if an activation function can fire/activate or not?
Choose an activation function which will approximate the function faster leading to faster training processes.
For example a sigmoid function works well for binary classification problems because approximating our classifier functions as combinations of the sigmoid is easier than the ReLU