Neural Nets Flashcards
https://towardsdatascience.com/the-math-behind-neural-networks-a34a51b93873
Feedforward Neural Networks (FNN)
It’s like a one-way street for data — information travels straight from the input, through any hidden layers, and out the other side to the output. These networks are the go-to for simple predictions and sorting things into categories.
Convolutional Neural Networks (CNN)
CNNs are the big guns in the world of computer vision. They’ve got a knack for picking up on the spatial patterns in images, thanks to their specialized layers. This ability makes them stars at recognizing images, spotting objects within them, and classifying what they see. They’re the reason your phone can tell a dog from a cat in photos.
Recurrent Neural Networks (RNN)
RNNs have a memory of sorts, making them great for anything involving sequences of data, like sentences, DNA sequences, handwriting, or stock market trends. They loop information back around, allowing them to remember previous inputs in the sequence. This makes them ace at tasks like predicting the next word in a sentence or understanding spoken language.
Long Short-Term Memory Networks (LSTM)
LSTMs are a special breed of RNNs built to remember things for longer stretches. They’re designed to solve the problem of RNNs forgetting stuff over long sequences. If you’re dealing with complex tasks that need to hold onto information for a long time, like translating paragraphs or predicting what happens next in a TV series, LSTMs are your go-to.
Generative Adversarial Networks (GAN)
Imagine two AIs in a cat-and-mouse game: one generates fake data (like images), and the other tries to catch what’s fake and what’s real. That’s a GAN. This setup allows GANs to create incredibly realistic images, music, text, and more. They’re the artists of the neural network world, generating new, realistic data from scratch.
Weight
Think of weights as the neuron’s way of deciding how important an input is (multiply the input by the weight).
Bias
A tweak to make sure the neuron’s output fits just right (added to the input).
Activation Function
This step is where the magic happens, allowing the neuron to tackle complex patterns by bending and stretching the data in nonlinear ways. Popular choices for this function are ReLU, Sigmoid, and Tanh, each with its way of tweaking the data.
Weighted Sum
The first step in the neural computation process involves aggregating the inputs to a neuron, each multiplied by their respective weights, and then adding a bias term. This operation is known as the weighted sum or linear combination.
Mathematically, it is expressed as:
Y= W1X1+W2X2+b (summation over n weights)
The weighted sum is crucial because it constitutes the raw input signal to a neuron before any non-linear transformation. It allows the network to perform a linear transformation of the inputs, adjusting the importance (weight) of each input in the neuron’s output.
Sigmoid Activation Function
This function squeezes its input into a narrow range between 0 and 1. It’s like taking any value, no matter how large or small, and translating it into a probability.
f(x) = 1/(1 + e^−x)
You’ll see sigmoid functions in the final layer of binary classification networks, where you need to decide between two options — yes or no, true or false, 1 or 0.
Hyperbolic Tangent Function (tanh)
tanh stretches the output range to between -1 and 1. This centers the data around 0, making it easier for layers down the line to learn from it.
f(x) = tanh(x) = (2/(1+e^-2x)) - 1
It’s often found in the hidden layers, helping to model more complex data relationships by balancing the input signal.
Rectified Linear Unit (ReLU)
ReLU is like a gatekeeper that passes positive values unchanged but blocks negatives, turning them to zero. This simplicity makes it very efficient and helps overcome some tricky problems in training deep neural networks.
f(x) = max(0,x)
Its simplicity and efficiency have made ReLU incredibly popular, especially in convolutional neural networks (CNNs) and deep learning models.
Leaky Rectified Linear Unit (Leaky ReLU)
Leaky ReLU allows a tiny, non-zero gradient when the input is less than zero, which keeps neurons alive and kicking even when they’re not actively firing.
f(x) = max(αx,x)
It’s a tweak to ReLU used in cases where the network might suffer from “dead neurons,” ensuring all parts of the network stay active over time.
Exponential Linear Unit (ELU)
ELU smooths out the function for negative inputs (using a parameter α for scaling), allowing for negative outputs but with a gentle curve. This can help the network maintain a mean activation closer to zero, improving learning dynamics.
f(x) = x (if x > 0)
α(e^x - 1)
Useful in deeper networks where ReLU’s sharp threshold could slow down learning.
Softmax Function
The softmax function turns logits, the raw output scores from the neurons, into probabilities by exponentiating and normalizing them. It ensures that the output values sum up to one, making them directly interpretable as probabilities.
f(x)i = e^xi / Σj e^xj
It’s the go-to for the output layer in multi-class classification problems, where each neuron corresponds to a different class, and you want to pick the most likely one.