Neural Networks Intro Flashcards by R W

What does creating a machine learning algorithm mean?

It means building a model that outputs correct information from provided input data

How well did you know this?

Not at all

Perfectly

What is the training process?

The process which the model learns to make sense of the input data.

How well did you know this?

Not at all

Perfectly

What are the 4 ingredients of training an algorithm?

Data
Model
Objective Function
Optimization Algorithm

How well did you know this?

Not at all

Perfectly

Which of the following is NOT a building block of a machine learning algorithm?

Data

Variable

Objective function

Optimization algorithm

Variable

How well did you know this?

Not at all

Perfectly

Training the model is:

A pure trial and error process

A kind of trial-and-error process with some feedback

The process of giving guidelines to the computer to find patterns

The process of watching people or a machine perform an activity and replicating it

A kind of trial-and-error process with some feedback

How well did you know this?

Not at all

Perfectly

Self-driving cars learn by:

Driving many hours before learning how to do it safely and efficiently

A very strict set of rules that Elon Musk and the others are programming day and night

“Watching” thousands of hours of footage of real people driving

Breaking the rules (e.g., go on red light, go over the speed limit) and waiting to get punished for it

“Watching” thousands of hours of footage of real people driving

How well did you know this?

Not at all

Perfectly

What are the 3 types of machine learning?

Supervised
Unsupervised
Reinforcement

How well did you know this?

Not at all

Perfectly

What are the two subtypes of supervised learning?

Regression
Classification

How well did you know this?

Not at all

Perfectly

The linear model is given by:

y = xT w + b

y = wx + b

y = wT x + b

All of the above

How well did you know this?

Not at all

Perfectly

The linear model for multiple inputs is given by:

y = xw + b

y = x1w1 + x2w2 + b

y = x1w2 + x2w1 + b

y = x2w2 + x2w2 + b

y = xw + b

How well did you know this?

Not at all

Perfectly

You have y = xw + b, where w = [1.2, -3], while b = [7]. If x = [2 , 3], what is the value of y?

0.4

-13.6

4.6

9.4

0.4

How well did you know this?

Not at all

Perfectly

What is the Objective Function?

the measure used to evaluate how well the model’s outputs match the desired correct values

How well did you know this?

Not at all

Perfectly

What are the two Objective Function types?

Loss - minimizing the error

Reward - reinforcement earning - super mario hight score example

How well did you know this?

Not at all

Perfectly

In supervised learning, we are dealing with:

lost functions

loss functions

reward functions

reinforcement functions

loss functions

How well did you know this?

Not at all

Perfectly

Reward functions are NOT:

functions we are trying to maximize

functions used in reinforcement learning

functions we are trying to minimize

functions

functions we are trying to minimize

How well did you know this?

Not at all

Perfectly

What is the common Loss Function dealing with Regresssion?

Study These Flashcards

L2-norm

What is the common Loss Function when dealing with Classificaiton

Study These Flashcards

Cross-entropy

What is a Target?

Study These Flashcards

The desired outcome. We want Y to be as close to T as possible

What are the outputs of a regression?

Study These Flashcards

continuous numbers

A target is:

The correct value at which we are aiming

A synonym for output

A part of the model

Always bigger than 0

Study These Flashcards

The correct value at which we are aiming

The objective function measures:

how well the targets match our model’s outputs

how well our model’s outputs match the targets

the model’s parameters

linearity of the data

Study These Flashcards

how well our model’s outputs match the targets

The L2-norm loss is used for:

k-means clustering

classification

regression

hierarchical clustering

Study These Flashcards

regression

Cross-entropy loss is used for:

k-means clustering

classification

regression

hierarchical clustering

Study These Flashcards

classification

Which cross-entropy points to the best match between outputs and targets?

L = 12.41

L = 0.78

L = 0.44

L = 0.77

Study These Flashcards

L = 0.44

The cross-entropy loss divided by 10 cannot be used for machine learning True False

False

The cross-entropy loss MULTIPLIED by 10 cannot be used for machine learning True False

False

The gradient is: a generalization of the derivative concept a generalization of the integral concept a generalization of the optimization algorithm a generalization of the objective function

a generalization of the derivative concept

The gradient descent is a type of: data model objective function optimization algorithm

optimization algorithm

The learning rate is denoted by which Greek letter? alpha sigma nabla eta

eta

A high learning rate: is better, as you learn faster is faster, but may not reach the minimum is slower, but sure to reach the minimum is faster, and always reaches the minimum

is faster, but may not reach the minimum

How is the Loss Function denoted?

L(y,t) - loss C(y,t) - cost E(y,t) - error

N-parameter gradient descent differs from the 1-parameter gradient descent as it deals with: many weights and biases many input variables many output variables many targets

many weights and biases

We use the delta to denote: difference in models difference between outputs and inputs difference between outputs and targets difference between methodologies

difference between outputs and targets

The weights and biases: Are the same thing Have different update rules Are rarely updated Have update rules following the same logic

Have update rules following the same logic

What determines a model uniquely?

The weights (w), biases (b), and model architecture

Training in a deep network is done by adjusting the parameters in the direction of minimizing the loss. How do you find that direction?

By calculating the gradient of the loss with respect to the weights

Why is the loss sometimes divided by the number of observations/data points?

to keep the same learning rate for different sized datasets

Neural Networks Intro Flashcards

(37 cards)