Neural Nets Flashcards

https://towardsdatascience.com/the-math-behind-neural-networks-a34a51b93873

1
Q

Feedforward Neural Networks (FNN)

A

It’s like a one-way street for data — information travels straight from the input, through any hidden layers, and out the other side to the output. These networks are the go-to for simple predictions and sorting things into categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Convolutional Neural Networks (CNN)

A

CNNs are the big guns in the world of computer vision. They’ve got a knack for picking up on the spatial patterns in images, thanks to their specialized layers. This ability makes them stars at recognizing images, spotting objects within them, and classifying what they see. They’re the reason your phone can tell a dog from a cat in photos.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Recurrent Neural Networks (RNN)

A

RNNs have a memory of sorts, making them great for anything involving sequences of data, like sentences, DNA sequences, handwriting, or stock market trends. They loop information back around, allowing them to remember previous inputs in the sequence. This makes them ace at tasks like predicting the next word in a sentence or understanding spoken language.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Long Short-Term Memory Networks (LSTM)

A

LSTMs are a special breed of RNNs built to remember things for longer stretches. They’re designed to solve the problem of RNNs forgetting stuff over long sequences. If you’re dealing with complex tasks that need to hold onto information for a long time, like translating paragraphs or predicting what happens next in a TV series, LSTMs are your go-to.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Generative Adversarial Networks (GAN)

A

Imagine two AIs in a cat-and-mouse game: one generates fake data (like images), and the other tries to catch what’s fake and what’s real. That’s a GAN. This setup allows GANs to create incredibly realistic images, music, text, and more. They’re the artists of the neural network world, generating new, realistic data from scratch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Weight

A

Think of weights as the neuron’s way of deciding how important an input is (multiply the input by the weight).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Bias

A

A tweak to make sure the neuron’s output fits just right (added to the input).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Activation Function

A

This step is where the magic happens, allowing the neuron to tackle complex patterns by bending and stretching the data in nonlinear ways. Popular choices for this function are ReLU, Sigmoid, and Tanh, each with its way of tweaking the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Weighted Sum

A

The first step in the neural computation process involves aggregating the inputs to a neuron, each multiplied by their respective weights, and then adding a bias term. This operation is known as the weighted sum or linear combination.
Mathematically, it is expressed as:
Y= W1X1+W2X2+b (summation over n weights)

The weighted sum is crucial because it constitutes the raw input signal to a neuron before any non-linear transformation. It allows the network to perform a linear transformation of the inputs, adjusting the importance (weight) of each input in the neuron’s output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sigmoid Activation Function

A

This function squeezes its input into a narrow range between 0 and 1. It’s like taking any value, no matter how large or small, and translating it into a probability.

f(x) = 1/(1 + e^−x)

You’ll see sigmoid functions in the final layer of binary classification networks, where you need to decide between two options — yes or no, true or false, 1 or 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Hyperbolic Tangent Function (tanh)

A

tanh stretches the output range to between -1 and 1. This centers the data around 0, making it easier for layers down the line to learn from it.

f(x) = tanh(x) = (2/(1+e^-2x)) - 1

It’s often found in the hidden layers, helping to model more complex data relationships by balancing the input signal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Rectified Linear Unit (ReLU)

A

ReLU is like a gatekeeper that passes positive values unchanged but blocks negatives, turning them to zero. This simplicity makes it very efficient and helps overcome some tricky problems in training deep neural networks.

f(x) = max(0,x)

Its simplicity and efficiency have made ReLU incredibly popular, especially in convolutional neural networks (CNNs) and deep learning models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Leaky Rectified Linear Unit (Leaky ReLU)

A

Leaky ReLU allows a tiny, non-zero gradient when the input is less than zero, which keeps neurons alive and kicking even when they’re not actively firing.

f(x) = max(αx,x)

It’s a tweak to ReLU used in cases where the network might suffer from “dead neurons,” ensuring all parts of the network stay active over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Exponential Linear Unit (ELU)

A

ELU smooths out the function for negative inputs (using a parameter α for scaling), allowing for negative outputs but with a gentle curve. This can help the network maintain a mean activation closer to zero, improving learning dynamics.

f(x) = x (if x > 0)
α(e^x - 1)

Useful in deeper networks where ReLU’s sharp threshold could slow down learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Softmax Function

A

The softmax function turns logits, the raw output scores from the neurons, into probabilities by exponentiating and normalizing them. It ensures that the output values sum up to one, making them directly interpretable as probabilities.

f(x)i = e^xi / Σj e^xj

It’s the go-to for the output layer in multi-class classification problems, where each neuron corresponds to a different class, and you want to pick the most likely one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Backpropogation

A

Backpropagation, short for “backward propagation of errors,” is a method for efficiently calculating the gradient of the loss function concerning all weights in the network. It consists of two main phases: a forward pass, where the input data is passed through the network to generate an output, and a backward pass, where the output is compared to the target value, and the error is propagated back through the network to update the weights.

The essence of backpropagation is the chain rule of calculus, which is used to calculate the gradients of the loss function for each weight by multiplying the gradients of the layers behind it. This process reveals how much each weight contributes to the error, providing a clear path for its adjustment.

∂ L/∂ w = ∂ L/∂ a * ∂ a/∂ z * ∂ z/∂ w

17
Q

Gradient Descent

A

Gradient Descent is an optimization algorithm used for minimizing the loss function in a neural network. It works by iteratively moving the weights in the direction of the steepest decrease in loss. The amount by which the weights are adjusted in each iteration is determined by the learning rate, a hyperparameter that controls the size of the steps.

w (new) = w (old) - n * ∂ L/∂ w

18
Q

Stochastic Gradient Descent (SGD)

A

Stochastic Gradient Descent (SGD) takes the core idea of gradient descent but changes the approach by using just one training example at a time to calculate the gradient and update the weights. This method is similar to making decisions based on quick, individual observations rather than waiting to gather everyone’s opinion. It can make the learning process much faster because the model updates more frequently and with less computational burden.