ML Andrew Ng Flashcards by shamsuddeen Muhammad

what is the diff btwn function with single feautures and one with multipple feautures

How well did you know this?

Not at all

Perfectly

How to drive Multi variate formual with Transpose and X

How well did you know this?

Not at all

Perfectly

What is parameter ?

Theta is the parameter

How well did you know this?

Not at all

Perfectly

How to fit the parameter (to obtain best one) in the hypothesis (prediction equation)

Using gradient descend

How well did you know this?

Not at all

Perfectly

What is the cost function for both linear and mulytivariate feature

How well did you know this?

Not at all

Perfectly

Gradient descend formula for both one variable and double

How well did you know this?

Not at all

Perfectly

If you have a problem with multiple feautures , you should make sure those feautures have similar scales ….therefore

gradient descend will converge more easily

How well did you know this?

Not at all

Perfectly

example of future scaling ranges

How well did you know this?

Not at all

Perfectly

Gradient descent feauture scaling and normalization

How well did you know this?

Not at all

Perfectly

example of feauture scaling

How well did you know this?

Not at all

Perfectly

What is the job of gradient descent?

The job of gradient descent is to find the value of theta for you that hopefully minimizes the cost function J(theta).

How well did you know this?

Not at all

Perfectly

How to know if gradient descent is working?

If the gradient descend is workng properly , then J(Q)

Should decreas after every iteration . If Gdecent is increasing , it means , it is not working

How well did you know this?

Not at all

Perfectly

cheking gradient descent is working

How well did you know this?

Not at all

Perfectly

Gradient Descent Learning Rate

How well did you know this?

Not at all

Perfectly

To choose alpha or learning rate , use ?

0.001 , 0.01, 0.1,1… or u can go for 3 fold increase , 0.001 , 0.003,0.009..

How well did you know this?

Not at all

Perfectly

what will happen if alpha is too small?

then the learning rate is will be slow

How well did you know this?

Not at all

Perfectly

polynomial regression?

How well did you know this?

Not at all

Perfectly

feautues and polynomial regression

How well did you know this?

Not at all

Perfectly

What is normal equation

How well did you know this?

Not at all

Perfectly

“What does the machine (? what is the machine here) actually learn?”

i.e. the statistical model

How well did you know this?

Not at all

Perfectly

……Grew from the field of AI.

Machine Learning

How well did you know this?

Not at all

Perfectly

Examples of where machine learning can be applied?

Database mining , application that we cannot program by hand( NLP , autonomous heli , computer vision) ,sel-customizing program(Netfliz , Amazon ,) , understanding human brain

How well did you know this?

Not at all

Perfectly

not well defined , but a field of study that gives computers the ability to learn without without being explicitly programmed

Definition of ML , by Samuel

How well did you know this?

Not at all

Perfectly

Well posed learning probelm , A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, it is performance at tasks in T, as measured by P , improves the expereince E.

Tom Michel. Example samule wrote a checkers playing program..had the program play 1000o games against itslef. E= 10000 examples , T is playing checkers , P if you win or not

How well did you know this?

Not at all

Perfectly

Supervised learming

Teach the computer how to do something , then let it use its new found knowldege to do it

Unsupervise

Let the computer learn how to do something ,and use this to determine the structure and patterns in data

Other learning , we have reinforcement learning and recommender system

part of machine learning tech

The most common type of probelm in machine learning

Probably the most common probelm type in machine learning

Predicting continue value output is called?

Regression probelm ?

Classifying data into 2 discrete classes - no in between

is called Classification probelm.

Support-vector machine support infinite number of feautures

one example

which learning algorihm is using unlabbled data?

unsupervise learning algoritm

just told , here is the data , can you structur it?

one way of doing this is cluster analysis......which is unsupervise learning.

Examples of clustering ?

organizing news , genomics , organize computer clusters , social network analysis (customer data) ,

input variables of feautures and output variable or target

input an doutput in Rgression

In the case of the simple ...... (y approximately equal b0 + b1 \* X where X is one column/variable) the model “learns” (read: estimates) two parameters; b0: the bias (or more traditionally the intercept); and, b1: the slope

Simple Linear Regression

The bias is the level of y when X is 0 (i.e. the value of sales when advertising spend is 0) and the slope is the rate of predicted increase or decrease in y for each unit increase in X (i.e. how much do sales increase per pound spent on advertising). Both parameters are scalars (single values).

Regression parameters

Univariate Linear Regression is also called....

Linear Regression (with one parameter)

What is the form of Linear Regression?

What is cost function?

A cost function let us find how best to fit straigght line on our data

Choosing values for Theta gives differnt ?????

functionand threfore diff regression line

Base on our training set , we want generate parameters that .....

makes straight line. To fromalizw this we are trying to solve minimization problem. that is minimize diff btw h(x)-y for every every and each example and sum it over training set

In ML, ........are used to estimate how badly models are performing. Put simply, a cost function is a measure of how wrong the model is in terms of its ability to estimate the relationship between X and y. This cost function (you may also see this referred to as loss or error.) can be estimated by iteratively running the model to compare estimated predictions against “ground truth” — the known values of y. This is typically expressed as a difference or distance between the predicted value and the actual value. The cost function (you may also see this referred to as loss or error.) can be estimated by iteratively running the model to compare estimated predictions against “ground truth” — the known values of y. The objective of a ML model, therefore, is to find parameters, weights or a structure that minimises the cost function.

cost function. also called lost function.

cost function details formula

......is like your prediction machine , throw it X and will get putative y

Hypothesis. We want summarize cost function. and because of summation terms , it is ineherently looking at all the data in the training set at any time.

cost function

cost function intuition calculation

The function is called square error cost function is ,.....

cost function

...............is a way to using ur tarining data , determine values for your theta which makes the hypothesis as accurate as possible.

cost or cost function

Most common use function for regression iss....

cost function

cost function general representation

What is cost function objectives?

Gradient descent

So we have our hypothesis function and we have a way of measuring how well it fits into the data. Now we need to estimate the parameters in the hypothesis function. That's where gradient descent comes in. Gradient descent minimize cost function

Once the model learns these ........s they can be used to compute estimated values of y given new values of X. In other words, you can use these learned parameters to predict values of y when you don’t know what y is — hey presto, a predictive model!

parameters ....In Linear regression

Learning parameters: Cost functions There are several ways to learn the parameters of a LR model, I will focus on the approach that best illustrates statistical learning;........

minimising a cost function.

Remember that in ML, the focus is on learning from data. This is perhaps better illustrated using a simple analogy. As children we typically learn what is “right” or “good” behaviour by being told NOT to do things or being punished for having done something we shouldn’t. For example, you can imagine a four year-old sitting by a fire to keep warm, but not knowing the danger of fire, she puts her finger into it and gets burned. The next time she sits by the fire, she doesn’t get burned, but she sits too close, gets too hot and has to move away. The third time she sits by the fire she finds the distance that keeps her warm without exposing her to any danger. In other words, through experience and feedback (getting burned, then getting too hot) the kid learns the optimal distance to sit from the fire. The heat from the fire in this example acts as a ........ — it helps the learner to correct / change behaviour to minimize mistakes.

cost function

In ML, cost functions are used to estimate how badly models are performing. Put simply, a cost function is a measure of how wrong the model is in terms of its ability to estimate the relationship between X and y. This is typically expressed as a difference or distance between the predicted value and the actual value. The cost function (you may also see this referred to as loss or error.) can be estimated by iteratively running the model to compare estimated predictions against “ground truth” — the known values of y. The objective of a ML model, therefore, is to find parameters, weights or a structure that minimises the cost function.

Now that we know that models learn by minimizing a cost function, you may naturally wonder how the cost function is minimized — enter ,,,,,,,,. Gradient descent. it is an efficient optimization algorithm that attempts to find a local or global minima of a function.

gradient descent. Gradient descent enables a model to learn the gradient or direction that the model should take in order to reduce errors (differences between actual y and predicted y). Direction in the simple linear regression example refers to how the model parameters b0 and b1 should be tweaked or corrected to further reduce the cost function. As the model iterates, it gradually converges towards a minimum where further tweaks to the parameters produce little or zero changes in the loss — also referred to as convergence.

This process is integral (no calculus pun intended!) to the ML process, because it greatly expedites the learning process — you can think of it as a means of receiving corrective feedback on how to improve upon your previous performance. The alternative to the gradient descent process would be brute forcing a potentially infinite combination of parameters until the set that minimizes the cost are identified. For obvious reasons this isn’t really feasible. Gradient descent, therefore, enables the learning process to make corrective updates to the learned estimates that move the model toward an optimal combination of parameters.....which process?

Gradient Descent.

......minimise cost function and , use all over place for ML

Gradient decent

How gradient descent works?

Gradient Descent Algorithm

What is alpha ???

Alpha is a learning rate . it control how big steps u take. .if alpha is big , u have aggreesive gradient desent, if alpha is small take tiny steps for gradient descent

How Gradeint Descent is implemented?

understdaning gradient descent

Linear Regression with gradient descent

There are other method for finding solution to minimum function...what are they ?

Normal equation method , but gradient descent scale better to large dataset and use in alot of contxt and machine learning

Normal equation