ML Andrew Ng Flashcards

1
Q

what is the diff btwn function with single feautures and one with multipple feautures

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How to drive Multi variate formual with Transpose and X

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is parameter ?

A

Theta is the parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to fit the parameter (to obtain best one) in the hypothesis (prediction equation)

A

Using gradient descend

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the cost function for both linear and mulytivariate feature

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Gradient descend formula for both one variable and double

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If you have a problem with multiple feautures , you should make sure those feautures have similar scales ….therefore

A

gradient descend will converge more easily

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

example of future scaling ranges

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Gradient descent feauture scaling and normalization

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

example of feauture scaling

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the job of gradient descent?

A

The job of gradient descent is to find the value of theta for you that hopefully minimizes the cost function J(theta).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to know if gradient descent is working?

A

If the gradient descend is workng properly , then J(Q)

Should decreas after every iteration . If Gdecent is increasing , it means , it is not working

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

cheking gradient descent is working

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Gradient Descent Learning Rate

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

To choose alpha or learning rate , use ?

A

0.001 , 0.01, 0.1,1… or u can go for 3 fold increase , 0.001 , 0.003,0.009..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what will happen if alpha is too small?

A

then the learning rate is will be slow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

polynomial regression?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

feautues and polynomial regression

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is normal equation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

“What does the machine (? what is the machine here) actually learn?”

A

i.e. the statistical model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

……Grew from the field of AI.

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Examples of where machine learning can be applied?

A

Database mining , application that we cannot program by hand( NLP , autonomous heli , computer vision) ,sel-customizing program(Netfliz , Amazon ,) , understanding human brain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

not well defined , but a field of study that gives computers the ability to learn without without being explicitly programmed

A

Definition of ML , by Samuel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Well posed learning probelm , A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, it is performance at tasks in T, as measured by P , improves the expereince E.

A

Tom Michel. Example samule wrote a checkers playing program..had the program play 1000o games against itslef. E= 10000 examples , T is playing checkers , P if you win or not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Supervised learming

A

Teach the computer how to do something , then let it use its new found knowldege to do it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Unsupervise

A

Let the computer learn how to do something ,and use this to determine the structure and patterns in data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Other learning , we have reinforcement learning and recommender system

A

part of machine learning tech

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

The most common type of probelm in machine learning

A

Probably the most common probelm type in machine learning

29
Q

Predicting continue value output is called?

A

Regression probelm ?

30
Q

Classifying data into 2 discrete classes - no in between

A

is called Classification probelm.

31
Q

Support-vector machine support infinite number of feautures

A

one example

32
Q

which learning algorihm is using unlabbled data?

A

unsupervise learning algoritm

33
Q

just told , here is the data , can you structur it?

A

one way of doing this is cluster analysis……which is unsupervise learning.

34
Q

Examples of clustering ?

A

organizing news , genomics , organize computer clusters , social network analysis (customer data) ,

35
Q

input variables of feautures and output variable or target

A

input an doutput in Rgression

36
Q

In the case of the simple …… (y approximately equal b0 + b1 * X where X is one column/variable) the model “learns” (read: estimates) two parameters;

b0: the bias (or more traditionally the intercept); and,
b1: the slope

A

Simple Linear Regression

37
Q

The bias is the level of y when X is 0 (i.e. the value of sales when advertising spend is 0) and the slope is the rate of predicted increase or decrease in y for each unit increase in X (i.e. how much do sales increase per pound spent on advertising). Both parameters are scalars (single values).

A

Regression parameters

38
Q

Univariate Linear Regression is also called….

A

Linear Regression (with one parameter)

39
Q

What is the form of Linear Regression?

A
40
Q

What is cost function?

A

A cost function let us find how best to fit straigght line on our data

41
Q

Choosing values for Theta gives differnt ?????

A

functionand threfore diff regression line

42
Q

Base on our training set , we want generate parameters that …..

A

makes straight line. To fromalizw this we are trying to solve minimization problem. that is minimize diff btw h(x)-y for every every and each example and sum it over training set

43
Q

In ML, ……..are used to estimate how badly models are performing. Put simply, a cost function is a measure of how wrong the model is in terms of its ability to estimate the relationship between X and y. This cost function (you may also see this referred to as loss or error.) can be estimated by iteratively running the model to compare estimated predictions against “ground truth” — the known values of y.

This is typically expressed as a difference or distance between the predicted value and the actual value. The cost function (you may also see this referred to as loss or error.) can be estimated by iteratively running the model to compare estimated predictions against “ground truth” — the known values of y.

The objective of a ML model, therefore, is to find parameters, weights or a structure that minimises the cost function.

A

cost function. also called lost function.

44
Q

cost function details formula

A
45
Q

……is like your prediction machine , throw it X and will get putative y

A

Hypothesis.

We want summarize cost function. and because of summation terms , it is ineherently looking at all the data in the training set at any time.

46
Q

cost function

A
47
Q

cost function intuition calculation

A
48
Q

The function is called square error cost function is ,…..

A

cost function

49
Q

……………is a way to using ur tarining data , determine values for your theta which makes the hypothesis as accurate as possible.

A

cost or cost function

50
Q

Most common use function for regression iss….

A

cost function

51
Q

cost function general representation

A
52
Q

What is cost function objectives?

A
53
Q

Gradient descent

A

So we have our hypothesis function and we have a way of measuring how well it fits into the data. Now we need to estimate the parameters in the hypothesis function. That’s where gradient descent comes in.

Gradient descent minimize cost function

54
Q

Once the model learns these ……..s they can be used to compute estimated values of y given new values of X. In other words, you can use these learned parameters to predict values of y when you don’t know what y is — hey presto, a predictive model!

A

parameters ….In Linear regression

55
Q

Learning parameters: Cost functions

There are several ways to learn the parameters of a LR model, I will focus on the approach that best illustrates statistical learning;……..

A

minimising a cost function.

56
Q

Remember that in ML, the focus is on learning from data. This is perhaps better illustrated using a simple analogy. As children we typically learn what is “right” or “good” behaviour by being told NOT to do things or being punished for having done something we shouldn’t. For example, you can imagine a four year-old sitting by a fire to keep warm, but not knowing the danger of fire, she puts her finger into it and gets burned. The next time she sits by the fire, she doesn’t get burned, but she sits too close, gets too hot and has to move away. The third time she sits by the fire she finds the distance that keeps her warm without exposing her to any danger. In other words, through experience and feedback (getting burned, then getting too hot) the kid learns the optimal distance to sit from the fire. The heat from the fire in this example acts as a …….. — it helps the learner to correct / change behaviour to minimize mistakes.

A

cost function

57
Q

In ML, cost functions are used to estimate how badly models are performing. Put simply, a cost function is a measure of how wrong the model is in terms of its ability to estimate the relationship between X and y. This is typically expressed as a difference or distance between the predicted value and the actual value. The cost function (you may also see this referred to as loss or error.) can be estimated by iteratively running the model to compare estimated predictions against “ground truth” — the known values of y.

The objective of a ML model, therefore, is to find parameters, weights or a structure that minimises the cost function.

A

ML

58
Q

Now that we know that models learn by minimizing a cost function, you may naturally wonder how the cost function is minimized — enter ,,,,,,,,. Gradient descent. it is an efficient optimization algorithm that attempts to find a local or global minima of a function.

A

gradient descent.

Gradient descent enables a model to learn the gradient or direction that the model should take in order to reduce errors (differences between actual y and predicted y). Direction in the simple linear regression example refers to how the model parameters b0 and b1 should be tweaked or corrected to further reduce the cost function. As the model iterates, it gradually converges towards a minimum where further tweaks to the parameters produce little or zero changes in the loss — also referred to as convergence.

59
Q

This process is integral (no calculus pun intended!) to the ML process, because it greatly expedites the learning process — you can think of it as a means of receiving corrective feedback on how to improve upon your previous performance. The alternative to the gradient descent process would be brute forcing a potentially infinite combination of parameters until the set that minimizes the cost are identified. For obvious reasons this isn’t really feasible. Gradient descent, therefore, enables the learning process to make corrective updates to the learned estimates that move the model toward an optimal combination of parameters…..which process?

A

Gradient Descent.

60
Q

……minimise cost function and , use all over place for ML

A

Gradient decent

61
Q

How gradient descent works?

A
62
Q

Gradient Descent Algorithm

A
63
Q

What is alpha ???

A

Alpha is a learning rate . it control how big steps u take. .if alpha is big , u have aggreesive gradient desent, if alpha is small take tiny steps for gradient descent

64
Q

How Gradeint Descent is implemented?

A
65
Q

understdaning gradient descent

A
66
Q

Linear Regression with gradient descent

A
67
Q

There are other method for finding solution to minimum function…what are they ?

A

Normal equation method , but gradient descent scale better to large dataset and use in alot of contxt and machine learning

68
Q

Normal equation

A
69
Q
A