Deep learning Flashcards
Artificial neural network -
A neural network is made of neurons (small units) that work together to find patterns in data.
Neuron (Perceptron) - How It Works:
✅ Input – Data goes into the neuron.
✅ Weight – Defines the importance of each input.
✅ Bias – Adjusts the output for better accuracy.
✅ Output – Final result (e.g., 1 or -1).
✅ Hidden Layers – Extra layers between input and output that detect deeper patterns.
✅ Output Layer – The final prediction after all processing.
Activation Function 🔄
Adds non-linearity to the network, helping it learn complex patterns.
Why Use Neural Networks?
They learn complex relationships between inputs and outputs, making them more powerful than traditional machine learning for many tasks!
Deep neural network -
An ANN with many hidden layers (layers between the input (first) layer and the output (last) layer)
Deep neural networks are capable of learning hierarchical features from raw data, making them well-suited for tasks such as image and speech recognition.
Deep learning -
input → feature extraction and classification → output.
Loss function (cost/objective function) -
It shows how far the predicted output is from the actual value. The goal is to minimize loss to improve accuracy.
🔹 Mean Squared Error (MSE) 📉 – Used for regression (predicting numbers).
✅ Calculates the average of squared differences between predicted & actual values.
🔹 Cross-Entropy Loss 🔄 – Used for classification (0 or 1).
✅ Measures the difference between predicted probabilities & actual labels.
👉 Smaller loss = Better model!
Neural network training -
🔹 Epoch – One full cycle through the dataset.
🔹 Sample – A single data point (one row in a dataset).
🔹 Iteration – How many times the model processes the data.
🔹 Batch Size – Number of samples processed before updating the model.
Training Steps:
1️⃣ Forward Propagation – allows the information to flow in one direction, from input to the output layer.
2️⃣ Error Estimation – using cost/loss function.
3️⃣ Backpropagation – Adjusts weights using Gradient Descent to improve accuracy.
📌 Repeat these steps to train the model!
Gradient descent -
is an optimization algorithm to minimize the loss/cost function by finding the optimal set of model parameters during the training process. Initialize weigh randomly, Repeat until convergence, Comput gradient, Update weights and Return weights.
Learning rate -
What it does: Determines how big steps the model takes to find the best solution.
Effects:
🔹 Small learning rate → Slow training, many steps needed.
🔹 Large learning rate → Can overshoot the best solution, making training unstable.
How to Choose the Best Learning Rate ✅
1️⃣ Trial & Observation: Test different values and see what works best.
2️⃣ Learning Rate Annealing: Start high, then gradually lower it.
3️⃣ Adaptive Learning Rate: Automatically adjusts during training for better results.
Stochastic gradient descent (SGD) and mini-batch (SGD) -
Gradient Descent (GD): Uses all data points (n) to reduce the cost function, requiring high computation power. Slow but more accurate.
Stochastic Gradient Descent (SGD): Uses only one data point (n=1) per epoch for faster updates but less accurate gradient estimation. Fast but less accurate.
Mini-Batch SGD: Uses a small batch (k < n) of data points for parameter updates, balancing speed and accuracy. Faster and more accurate than SGD.