ANN Flashcards
What is an Artificial Neural Network (ANN)?
An Artificial Neural Network (ANN) is a deep learning model inspired by the structure and function of the human brain. It consists of multiple layers of interconnected neurons (nodes) that process and transform data. ANNs are widely used in machine learning and deep learning for tasks such as image recognition, natural language processing (NLP), speech recognition, and financial forecasting.
Why Use ANNs?
Handles Complex Relationships: Can model non-linear and intricate data patterns.
Scalability: Works on large datasets with high-dimensional features.
Adaptability: Learns continuously from new data, improving accuracy over time.
What nodes do?
Neuron (Perceptron):
A unit that processes input (multiply weight with input and adds a bias ), applies activation, and passes output
Equation of a Neuron Output: y=f(WX+b)
Weights:
Determines importance of an input
Adjusted during training
Bias:
Helps shift activation function for better learning
Activation Function:
Adds non-linearity (e.g., Sigmoid, ReLU)
What are the types of ANNs?
Feedforward Neural Network (FNN)
Information flows in one direction (Input → Hidden → Output)
Used for classification & regression tasks
Recurrent Neural Network (RNN)
Has memory! Output of previous steps influences current step
Used in time-series forecasting & NLP
Convolutional Neural Network (CNN)
Specialized for image processing
Uses convolutional layers to detect patterns
Transformer Models
Used for NLP tasks (e.g., ChatGPT, BERT)
Uses self-attention mechanism instead of recurrence
What is Forward Propagation?
Definition:
Forward Propagation (or Feedforward) is the process where input data flows forward through the layers of an ANN to produce an output.
Steps:
Inputs are multiplied by weights and added to a bias
The result is passed through an activation function
The output moves to the next layer
Final layer produces the prediction
No learning happens in forward propagation – only computations!
What is an Activation Function?
An activation function decides whether a neuron should be activated or not by applying a mathematical transformation to the weighted sum of inputs.( Think of it as a switch that turns a neuron “on” or “off” depending on the importance of the input.)
Why is it needed?
✅ Introduces non-linearity
✅ Helps in learning complex patterns
✅ Allows the network to approximate any function
What is Linear Activation Function?
Linear Activation Function
This function gives an output that is directly proportional to the input.
Problem is it cannot handle complex patterns because it doesn’t introduce non-linearity.
Think of it like a straight line—not very useful in deep learning
What is Sigmoid Activation Function?
Sigmoid Activation Function
This function squeezes all values into a range between 0 and 1.
Useful for binary classification, where the model decides between two categories.
The downside? It can make training slow because very large or very small values barely change the output (this is called the vanishing gradient problem).
What is Tanh Activation Function?
Tanh Activation Function
Similar to Sigmoid but outputs values between -1 and 1 instead of 0 and 1.
This makes it stronger than Sigmoid because negative inputs can have a negative influence.
Still suffers from the vanishing gradient problem, which can make deep networks difficult to train.
What is ReLU (Rectified Linear Unit)?
ReLU (Rectified Linear Unit)
The most popular activation function for deep learning.
It keeps positive values the same and sets all negative values to zero.
The advantage? It helps deep networks learn complex patterns without slowing down.
The downside? Some neurons can stop learning permanently if they keep getting negative inputs (dying ReLU problem).
What is Leaky ReLU?
Leaky ReLU
A small improvement over ReLU—it allows small negative values instead of setting them to zero.
This prevents neurons from completely “dying” and improves learning.
What is Softmax Activation Function?
Softmax Activation Function
Converts numbers into probabilities, making it useful for multi-class classification (where the model predicts one of many categories).
It ensures all outputs add up to 100% probability so we can interpret them as confidence scores.
How to Choose the Right Activation Function for binary classification ?
For binary classification (Yes/No, 0/1 problems): Sigmoid
How to Choose the Right Activation Function for multi-class classification?
For multi-class classification (More than 2 categories): Softmax
How to Choose the Right Activation Function for deep learning models?
For deep learning models: ReLU (or Leaky ReLU to fix dying neurons)
For simple problems: Tanh or Sigmoid
What is Backward Propagation ?
Backward Propagation (Learning Step)
The network compares the prediction with the actual output and calculates the error (loss).
It then moves backward through the layers, calculating how much each weight contributed to the error.
Using Gradient Descent, it adjusts the weights to reduce the error.
Explain Architecture of ANN
Layers of an ANN
🔹 Input Layer
The first layer that receives raw data (e.g., images, text, numbers).
Each neuron represents one feature of the input (e.g., in house price prediction: square footage, number of rooms).
🔹 Hidden Layers
These layers perform feature extraction and pattern learning.
Each neuron applies weights, biases, and an activation function to transform the data.
More hidden layers = deeper network (Deep Learning).
🔹 Output Layer
The final layer that gives the prediction.
Number of neurons depends on the problem:
1 neuron for Binary Classification (Yes/No).
Multiple neurons for Multi-Class Classification (e.g., cat, dog, bird).
1 neuron for Regression (continuous values like house prices).
What can be the preprocessing steps for ANN model
Load the dataset and see the columns, data types, dataset size, missing vales. Get some clear idea about the dataset
1) Handling Missing Values
Why?
Missing values can disrupt training and make predictions unreliable.
Neural networks don’t handle NaN values well.
What to do?
✔ Numerical Features: Fill missing values with mean/median (depends on data distribution).
✔ Categorical Features: Fill missing values with mode (most frequent value).
2) Encoding Categorical Data
Why?
Neural networks can’t work with text; they only process numbers.
Categorical data (e.g., gender, region) must be converted into a numerical format.
What to do?
✔ One-Hot Encoding: For categorical variables with no order (e.g., region, gender).
✔ Label Encoding: Only for ordered categories.
Handling Outliers (Optional for ANN)
Why?
Neural networks are robust to outliers (unlike Linear Regression).
But extreme values can still slow down training and affect convergence.
What to do?
✔ Use RobustScaler (instead of removing outliers).
✔ Log transformation can be used for highly skewed data.
Feature Scaling (Must Do for ANN!)
Why?
Different features have different scales, which can affect gradient descent.
Scaling helps the network learn faster and prevents larger numbers from dominating.
What to do?
✔ StandardScaler: If data follows a normal distribution (mean = 0, std = 1).
✔ MinMaxScaler: If data is not normally distributed (scales between 0 and 1).
More on why scaling:
If features have different scales, the weights associated with larger features will update slowly, while those associated with smaller features will update quickly. This imbalance causes inconsistent learning, making training:
✅ Unstable – The model may oscillate or diverge.
✅ Slow – Large weights take more iterations to update.
✅ Inefficient – Some features dominate learning, ignoring others.