Lecture 15 - Neural Networks (Advanced) Flashcards
Expert systems
Expert systems: systems that help make decision
-ex: decision tree
“feature engineering”
Duckie thing
Old school expert system
Not very robust because distracted by other yellow features (ex: girl in yellow dress)
…edge case
● ✅ Explainable
● ✅ Can be hand-crafted
● ❌ Difficult to hand-craft
● ❌ Falls apart with:
○ Large data (columns)
○ Edge cases
○ Non-linear correlations
CNNs pt.2 : Visual Hierarchical Processing
V1: small receptive field. Edges and lines
V2: little bit bigger receptive field. Shapes
V4: bigger receptive field. Objects
IT: biggest receptive field. Objects and faces.
layer 1, 2 and 3…parts combine to form objects
CNN Features:
- Sparse connectivity: : Only few pixel on the input influence each pixel on the output
-not everything is connected with everything else
-not densely connected - Shared weights: same little matrix is sliding across the entire image. One output image processed by same matrix… all share same matrix multiplication weight
- Invariance under translation..?
Autoencoders
Encodes itself
Want input data to be the same as output data (input and reconstructed input)
Ex with number 4:
Connectionist until bottle neck (pixel image)… symbolic after (number 4)
-it recognizes the pixels as the number 4… perception
But smaller the bottleneck, worse the reconstructed
Discrepancy = error, loss
Bottleneck = a compressed low dimensional representation of the input
input data… encoder
encoded data… latent space…bottleneck
decoder…reconstructed data
Latent space interpolation and arithmetic
Interpolation
going from a start image to an end image
2…3
smiling woman
Word embeddings
Ex: take the word Queen
if example 78,345th word out of 100,000 in the dictionary
put it into 1 and 0s
10011001000001001
Latent space artithmetic
King - Queen
Janitor - ?? house maid!
Need to avoid gender bias
Recurrent Neural Networks (RNNs)
The sky is blue |SEP| ✅.
Tabasco is a little potato man |SEP| ✅
Sliding window
Token = piece of the input
SEP = separator token (bars)
The feeding back into itself is what CNNs do not do… that’s a special thing about RNNs
Fundamentally can work, but in practice language is very complex
Better idea: Storage
Storage = context
See the history of it
Storage contains the history, the context that helps the decoder
But only has short-term memory
-not the case in LSTM! Short-term can be overridden, but long-term can only be added onto (keep context from beginning of sentence all the way to the end)
Input to Network (and Storage) to Output
See slides or relisten cuz idgi
RNN Types
● GRU cells
○ (Gated Recurrent Unit)
○ Only short-term memory
● LSTM
○ (Long Short-Term Memory)
○ Both short/”long”-term memory
Language Modelling (“NLP”)
Language modelling
Predictive text… Often goes in circle
-temperature adds randomness
The best thing about AI is its ability to…
Top 5 probabilities:
learn 4.5%
predict 3.5%
…
…
…
Character counts
Ex: feed the ai the wikipedia page of a cat. it sees how frequently certain letters appear. Ex: E = 5000 times (11%)
Can do it with the entire english language too
Bigrams = co-occurrences of 2 things (letter and letter, word to word)
Ex: q almost always co-occurs with u
Eventually the ai learns which letters usually co-occur and which words do too
Summary
● Expert systems:
○ handcrafted features,
○ explainable,
○ not scalable
● CNNs: (similar to VC)
○ sparse connect.,
○ shared W,
○ translation-invar.
● RNNs:
○ Storage
○ GRU - short term only
○ LSTM - long/short term
● Language modelling:
○ Letter counts → (letter) bigrams → (word) bigrams/trigrams/n-grams → word sequences + attention
54