Neural Networks Intro Flashcards
What does creating a machine learning algorithm mean?
It means building a model that outputs correct information from provided input data
What is the training process?
The process which the model learns to make sense of the input data.
What are the 4 ingredients of training an algorithm?
- Data
- Model
- Objective Function
- Optimization Algorithm
Which of the following is NOT a building block of a machine learning algorithm?
Data
Variable
Objective function
Optimization algorithm
Variable
Training the model is:
A pure trial and error process
A kind of trial-and-error process with some feedback
The process of giving guidelines to the computer to find patterns
The process of watching people or a machine perform an activity and replicating it
A kind of trial-and-error process with some feedback
Self-driving cars learn by:
Driving many hours before learning how to do it safely and efficiently
A very strict set of rules that Elon Musk and the others are programming day and night
“Watching” thousands of hours of footage of real people driving
Breaking the rules (e.g., go on red light, go over the speed limit) and waiting to get punished for it
“Watching” thousands of hours of footage of real people driving
What are the 3 types of machine learning?
- Supervised
- Unsupervised
- Reinforcement
What are the two subtypes of supervised learning?
- Regression
- Classification
The linear model is given by:
y = xT w + b
y = wx + b
y = wT x + b
All of the above
All of the above
The linear model for multiple inputs is given by:
y = xw + b
y = x1w1 + x2w2 + b
y = x1w2 + x2w1 + b
y = x2w2 + x2w2 + b
y = xw + b
You have y = xw + b, where w = [1.2, -3], while b = [7]. If x = [2 , 3], what is the value of y?
0.4
-13.6
4.6
9.4
0.4
What is the Objective Function?
the measure used to evaluate how well the model’s outputs match the desired correct values
What are the two Objective Function types?
Loss - minimizing the error
Reward - reinforcement earning - super mario hight score example
In supervised learning, we are dealing with:
lost functions
loss functions
reward functions
reinforcement functions
loss functions
Reward functions are NOT:
functions we are trying to maximize
functions used in reinforcement learning
functions we are trying to minimize
functions
functions we are trying to minimize
What is the common Loss Function dealing with Regresssion?
L2-norm
What is the common Loss Function when dealing with Classificaiton
Cross-entropy
What is a Target?
The desired outcome. We want Y to be as close to T as possible
What are the outputs of a regression?
continuous numbers
A target is:
The correct value at which we are aiming
A synonym for output
A part of the model
Always bigger than 0
The correct value at which we are aiming
The objective function measures:
how well the targets match our model’s outputs
how well our model’s outputs match the targets
the model’s parameters
linearity of the data
how well our model’s outputs match the targets
The L2-norm loss is used for:
k-means clustering
classification
regression
hierarchical clustering
regression
Cross-entropy loss is used for:
k-means clustering
classification
regression
hierarchical clustering
classification
Which cross-entropy points to the best match between outputs and targets?
L = 12.41
L = 0.78
L = 0.44
L = 0.77
L = 0.44
The cross-entropy loss divided by 10 cannot be used for machine learning
True
False
False
The cross-entropy loss MULTIPLIED by 10 cannot be used for machine learning
True
False
False
The gradient is:
a generalization of the derivative concept
a generalization of the integral concept
a generalization of the optimization algorithm
a generalization of the objective function
a generalization of the derivative concept
The gradient descent is a type of:
data
model
objective function
optimization algorithm
optimization algorithm
The learning rate is denoted by which Greek letter?
alpha
sigma
nabla
eta
eta
A high learning rate:
is better, as you learn faster
is faster, but may not reach the minimum
is slower, but sure to reach the minimum
is faster, and always reaches the minimum
is faster, but may not reach the minimum
How is the Loss Function denoted?
L(y,t) - loss
C(y,t) - cost
E(y,t) - error
N-parameter gradient descent differs from the 1-parameter gradient descent as it deals with:
many weights and biases
many input variables
many output variables
many targets
many weights and biases
We use the delta to denote:
difference in models
difference between outputs and inputs
difference between outputs and targets
difference between methodologies
difference between outputs and targets
The weights and biases:
Are the same thing
Have different update rules
Are rarely updated
Have update rules following the same logic
Have update rules following the same logic
What determines a model uniquely?
The weights (w), biases (b), and model architecture
Training in a deep network is done by adjusting the parameters in the direction of minimizing the loss. How do you find that direction?
By calculating the gradient of the loss with respect to the weights
Why is the loss sometimes divided by the number of observations/data points?
to keep the same learning rate for different sized datasets