Neural Nets Flashcards

Question 1

Q

Feedforward Neural Networks (FNN)

Answer

A

It’s like a one-way street for data — information travels straight from the input, through any hidden layers, and out the other side to the output. These networks are the go-to for simple predictions and sorting things into categories.

Question 2

Q

Convolutional Neural Networks (CNN)

Answer

A

CNNs are the big guns in the world of computer vision. They’ve got a knack for picking up on the spatial patterns in images, thanks to their specialized layers. This ability makes them stars at recognizing images, spotting objects within them, and classifying what they see. They’re the reason your phone can tell a dog from a cat in photos.

Question 3

Q

Recurrent Neural Networks (RNN)

Answer

A

RNNs have a memory of sorts, making them great for anything involving sequences of data, like sentences, DNA sequences, handwriting, or stock market trends. They loop information back around, allowing them to remember previous inputs in the sequence. This makes them ace at tasks like predicting the next word in a sentence or understanding spoken language.

Question 4

Q

Long Short-Term Memory Networks (LSTM)

Answer

A

LSTMs are a special breed of RNNs built to remember things for longer stretches. They’re designed to solve the problem of RNNs forgetting stuff over long sequences. If you’re dealing with complex tasks that need to hold onto information for a long time, like translating paragraphs or predicting what happens next in a TV series, LSTMs are your go-to.

Question 5

Q

Generative Adversarial Networks (GAN)

Answer

A

Imagine two AIs in a cat-and-mouse game: one generates fake data (like images), and the other tries to catch what’s fake and what’s real. That’s a GAN. This setup allows GANs to create incredibly realistic images, music, text, and more. They’re the artists of the neural network world, generating new, realistic data from scratch.

Question 6

Q

Weight

Answer

A

Think of weights as the neuron’s way of deciding how important an input is (multiply the input by the weight).

Question 7

Q

Bias

Answer

A

A tweak to make sure the neuron’s output fits just right (added to the input).

Question 8

Q

Activation Function

Answer

A

This step is where the magic happens, allowing the neuron to tackle complex patterns by bending and stretching the data in nonlinear ways. Popular choices for this function are ReLU, Sigmoid, and Tanh, each with its way of tweaking the data.

Question 9

Q

Weighted Sum

Answer

A

The first step in the neural computation process involves aggregating the inputs to a neuron, each multiplied by their respective weights, and then adding a bias term. This operation is known as the weighted sum or linear combination.
Mathematically, it is expressed as:
Y= W1X1+W2X2+b (summation over n weights)

The weighted sum is crucial because it constitutes the raw input signal to a neuron before any non-linear transformation. It allows the network to perform a linear transformation of the inputs, adjusting the importance (weight) of each input in the neuron’s output.

Question 10

Q

Sigmoid Activation Function

Answer

A

This function squeezes its input into a narrow range between 0 and 1. It’s like taking any value, no matter how large or small, and translating it into a probability.

f(x) = 1/(1 + e^−x)

You’ll see sigmoid functions in the final layer of binary classification networks, where you need to decide between two options — yes or no, true or false, 1 or 0.

Question 11

Q

Hyperbolic Tangent Function (tanh)

Answer

A

tanh stretches the output range to between -1 and 1. This centers the data around 0, making it easier for layers down the line to learn from it.

f(x) = tanh(x) = (2/(1+e^-2x)) - 1

It’s often found in the hidden layers, helping to model more complex data relationships by balancing the input signal.

Question 12

Q

Rectified Linear Unit (ReLU)

Answer

A

ReLU is like a gatekeeper that passes positive values unchanged but blocks negatives, turning them to zero. This simplicity makes it very efficient and helps overcome some tricky problems in training deep neural networks.

f(x) = max(0,x)

Its simplicity and efficiency have made ReLU incredibly popular, especially in convolutional neural networks (CNNs) and deep learning models.

Question 13

Q

Leaky Rectified Linear Unit (Leaky ReLU)

Answer

A

Leaky ReLU allows a tiny, non-zero gradient when the input is less than zero, which keeps neurons alive and kicking even when they’re not actively firing.

f(x) = max(αx,x)

It’s a tweak to ReLU used in cases where the network might suffer from “dead neurons,” ensuring all parts of the network stay active over time.

Question 14

Q

Exponential Linear Unit (ELU)

Answer

A

ELU smooths out the function for negative inputs (using a parameter α for scaling), allowing for negative outputs but with a gentle curve. This can help the network maintain a mean activation closer to zero, improving learning dynamics.

f(x) = x (if x > 0)
α(e^x - 1)

Useful in deeper networks where ReLU’s sharp threshold could slow down learning.

Question 15

Q

Softmax Function

Answer

A

The softmax function turns logits, the raw output scores from the neurons, into probabilities by exponentiating and normalizing them. It ensures that the output values sum up to one, making them directly interpretable as probabilities.

f(x)i = e^xi / Σj e^xj

It’s the go-to for the output layer in multi-class classification problems, where each neuron corresponds to a different class, and you want to pick the most likely one.

Question 16

Q

Backpropogation

Answer

Study These Flashcards

A

Backpropagation, short for “backward propagation of errors,” is a method for efficiently calculating the gradient of the loss function concerning all weights in the network. It consists of two main phases: a forward pass, where the input data is passed through the network to generate an output, and a backward pass, where the output is compared to the target value, and the error is propagated back through the network to update the weights.

The essence of backpropagation is the chain rule of calculus, which is used to calculate the gradients of the loss function for each weight by multiplying the gradients of the layers behind it. This process reveals how much each weight contributes to the error, providing a clear path for its adjustment.

∂ L/∂ w = ∂ L/∂ a * ∂ a/∂ z * ∂ z/∂ w

Question 17

Q

Gradient Descent

Answer

Study These Flashcards

A

Gradient Descent is an optimization algorithm used for minimizing the loss function in a neural network. It works by iteratively moving the weights in the direction of the steepest decrease in loss. The amount by which the weights are adjusted in each iteration is determined by the learning rate, a hyperparameter that controls the size of the steps.

w (new) = w (old) - n * ∂ L/∂ w

Question 18

Q

Stochastic Gradient Descent (SGD)

Answer

Study These Flashcards

A

Stochastic Gradient Descent (SGD) takes the core idea of gradient descent but changes the approach by using just one training example at a time to calculate the gradient and update the weights. This method is similar to making decisions based on quick, individual observations rather than waiting to gather everyone’s opinion. It can make the learning process much faster because the model updates more frequently and with less computational burden.

Neural Nets Flashcards

https://towardsdatascience.com/the-math-behind-neural-networks-a34a51b93873 (18 cards)