Lecture 12 - Neural Networks Flashcards
Where are ANNs used?
Artificial neuronal networks (ANN) or Neural Network (NN) (but we’re talking about artificially simulated)
Usually considered black boxes
Ex: used in credit card fraud detection, insurance claims, medical insurance processes
But 90% incorrect thing in the USA… (relisten)
Ex: ANN used in Chatgpt, google translate, self-driving cars, most modern robots (ex: roombas)
Artificial neurons
Real neuron: input to dendrites, cell body, axon and axon terminal (output)… then synapse into next neuron’s dendrites
Artificial neurons:
Weight corresponds to how important is this input to the neuron’s output
Input x Weight
Sum them up
Output = Sum of (Input x Weight)
Matrix multiplications!
BUT after doing this and before the output, there is also an Activation (the activation is like a threshold… either you have it or you don’t)
Like an instruction
Ex: if negative, turn into 0. If positive, y=x
BUT before even the activation, there is a Bias
-because real neurons can have a base excitation level
Ex: add a certain single value. Ex: add 1.5
So in order:
-Matrix multiplication of the sum of all the inputs x weight. Ex: (0.1-1.0) + (-0.81.0) + (0,5*0.0) = -0.9
-Then you add the bias. Ex: add 1.5. -0.9 + 1.5 = 0.6
-Then you add the activation. Ex: if negative value, becomes 0. If positive, keep. So here we would keep 0.6 The output is then 0.6
Weight and bias are parameters, usually random values at first, then get closer to real value as you keep training
Core of chatgpt is matrix multiplications (but with a trillion+ parameters lol)… 1.760 trillion parameters
Matrix Multiplication is also called Dot Product.
Activation Functions
Neurons firing at different rates
-Sigmoid Function: 0 to 1. input on y axis… more and more negative input on x axis, y= 0. As slowly approach zero on x axis , starts picking up. If get more and more positive, starts maxing out as y= 1.
(represents the neuron firing rates of previous slide)
-Tanh Function: Instead of 0 to 1, goes from -1 to 1.
More negative on x axis, gets closer to y=-1
More positive on x axis, gets closer to y=1
-ReLU (Rectified Linear Unit)
If input below 0, just cuts it out. (y=0)
If input positive, y=x
Like the activation in ANN!
Single-Layer “Networks”
Demo:
OR gate
AND gate
OR & AND+NOT gate
????
Perceptron is part of supervised learning
Perceptron Update Rule
δ(small delta) = desired output - actual output
ΔW(big delta) = ε * δ * input
?? see slides
but basically, we want our actual output to get closer to our desired output
More than one neuron
Inputs, hidden layers, outputs
Multi-layer Perceptron (MLP)
Dense Neural Network
“Deep” Neural Network
Linear Layers
Hidden layer because you don’t train them directly
Feedforward:
Muliplication Matrix of Inputs and Hidden layers … Outputs
Ex: potato child (his cat)
But if instead we want the desired output to not be potato child but rather chicken (the dog)
We have to do Backpropagation:
Desired output minus actual output (error/loss.. small delta)
Apply error layer by layer through the network
Use to update our weights
NN Training in General
● Randomly initialize all weights
● For each (input, desired_output) in dataset:
○ Put input through model (feed-forward), receive predicted_output
○ Calculate loss (desired_output - predicted_output)
○ Backpropagate loss through network, update all weights + biases
Don’t think these things are exam relevant but have a general idea:
Gradient = every gradient step, every update of your weight, brings you further and further along the correct trajectory towards your goal
Calculating the first derivative of every layer
Gradient-descent = how your network actually updates
Loss function = when desired output – predicted output cancel out… mean absolute error, more complex formula
Mini-batches = multiple lines of data at the same time
Summary
● Artificial Neurons very roughly model a single neuron
● Single “Cell”/layer components
○ Weights → output = Σ (inputs * weights)
○ Biases
○ Activation function
● Implementation: usually GPU → matrix multiplication
● Training algo: FW pass, Loss, BW pass
● CNNs
○ multiple kernels (3x3, 5x5, etc.)
○ feature hierarchy learned