Chapter 1: Fundamentals of Deep Learning Flashcards
Concise AI Definition
The effort to automate intellectual tasks normally performed by humans
Symbolic AI
The approach towards creating human-level artificial intelligence by having programmers handcraft a sufficiently large set of explicit rules for manipulating knowledge
What was the dominant paradigm in AI from the 1950s to the late 1980s?
Symbolic AI
What did Symbolic AI do well? and poorly?
did well with well-defined logical problems ex chess not well with fuzzy complex problems ex image classification
What replaced Symbolic AI
Machine Learning
What question gave rise to machine learning?
Could a computer go beyond what we know how to order it to perform? and learn on its own to perform a specified task?
What is the classical programming approach?
Humans input rules (a program) and data to be processed according to these rules, yielding answers
Machine Learning steps
Humans input data and the answers to the data and the computer outputs the rules that connect the two. These rules can then be applied to new data for original answers.
Are machine learning systems programmed?
No, they’re trained on data in which it finds statistical structure.
What is the relationship between Deep Learning, AI and Machine Learning?
(AI (ML (DL)))
Predefined set of operations from which ML selects its transformation?
Hypothesis Space
What is Deep Learning
an approach towards learning representations from data that puts an emphasis on learning successive layers of increasingly meaningful representations. Information-distillation via successive layers.
The layered representations in Deep Learning are almost always learned via what models?
neural networks
What determines the change a deep learning layer affects on its input?
Its weight, the change implemented by the layer is parameterized by its weights
Loss Function of the network
measures how far an output is from what is expected, capturing how well the network does
The central algorithm in Deep Learning
The Backpropagation algorithm
What happens to a network’s loss score?
It is used by the optimizer to implement the backpropagation algorithm to adjust the weights of the layers
What is the training loop for Deep Learning?
The process of starting with random weight values and through repeated feedback from the loss score adjusting layer weights
What is the goal of the training loop?
to find the layer weights that minimize the loss score
Probabilistic Modeling
An early form of Machine Learning. The application of the principles of statistics to data analysis
What is the best-known algorithm in Probabilistic Modeling?
Naive-Bayes
Naive Bayes
type of ML classifier based on applying Baye’s theorem while assuming that the features of in the input data area all independent (a strong/naive assumption)
Logistic Regression
a CLASSIFICATION algorithm not regression. Closely related to Naive Bayes, a type of probabilistic modeling
What held neural networks back for a long time?
The missing piece was an efficient way to train large neural networks
What is the Backpropagation Algorithm
A way to train chains of parametric operations using gradient-descent optimization
Kernel Methods
a group of classification algorithms, the best known is SVM
Support Vector Machine (SVM)
A classification algorithm which solves classification problems by trying to find good decision boundaries between categories.
How do SVMs decide on a decision boundary?
- data mapped onto hyperplane 2. try to compute the decision boundary that will best “maximize the margin” or maximize the distance between the closest points from each class
Kernel Trick
What gives kernel methods their name since otherwise they would often be computationally intractable. You dont have to compute coordinates of your points in new space just the distance between pairs of points; done via Kernel Function
Decision Trees
Flowchart-like, successive questions, allow for input classification or predict output values given inputs
Random Forest
A specific Decision tree algorithm involves building a large number of specialized decision trees and ensembling their outputs.
Gradient Boosting Machines
a ML technique based on ensembling weak prediction models, generally decision trees. “gradient boosting’ = Iteratively training new models that specialize in addressing the weak points of the previous models
Go-to algorithm for computer-vision tasks and generally all perceptual tasks
Deep Convolutional Neural networks
What were the key factors in Deep learning’s success?
- That it offered better performance on a wide array of problems. 2. Was much easier because it completely automates what used to be the most crucial step in ML workflow: Feature Engineering. in DL you learn all the features in one pass rather than having to reengineer them yourself
Why can’t multiple shallow-layer ML methods be applied in sequence to replicate Deep learning?
Because the optimal first representation layer in a three-layer model isn’t the optimal first layer in a one-layer or two-layer model.
What is transformative about deep learning?
it allows a model to learn al layers of representation JOINTLY, at the same time, rather than in succession (greedily)
Deep Learning could also be called?
Joint Learning, layers adjust together as any layer changes
2 essential characteristics of how Deep Learning learns from Data?
- the incremental, layer by layer way in which increasingly complex representations are developed 2. that the intermediate incremental representations are learned jointly
Two most important ML techniques to be familiar with today and their most effective uses
gradient-boosting machines for shallow-learning problems where structured data is available. Deep Learning for perceptual problems
Dominant Deep Learning Library?
Keras
Dominant gradient-boosting library?
XGBoost
The fundamental algorithm for Deep Learning in time series
The Long Short-term Memory Algorithm (LSTM)
Why are deep neural networks parallelizable?
Because they perform mostly many small matrix multiplications
TPU
Tensor Processing Unit. A chip designed and built by google from the ground up specifically for deep neural networks.
What algorithmic improvements allowed for better gradient-propagation circa 2010
- better activation functions for neural layers
- better weight-initialization schemes
- Better optimization schemes (RMSProp and Adam)
Theano and Tensorflow
two symbolic tensor-manipulation frameworks for python that support autodifferentiation simplifying the implementation of new models
What makes Deep Learning Neural nets so scalable?
Highly amenable to parallelization, DL models trained by iterating over small batches of data, so they can be trained on data sets of arbitrary size
What makes Deep Learning so much simpler than other Machine Learning techniques?
No need for feature engineering