Week1 Flashcards

Question

What are guassian mixture models?

Answer 1

A

ML is the study of computer algorithms that improve automatically through experience and use of data.

Answer 2

A

Humans make a tool design which uses data and produces a tool which will take the input and give a output.

In ML, Humans design learning algorithm which uses data to produce models

Answer 3

A

Programming /Human Labour Fails
a. Scales / Speed / Cost of human labour
b. Inability to express rules using language
c. Don’t know the exact rules transforming input to
output
ML can succeed
a. Have lots of example data
b. Have some structural ideas on data

Answer 4

A

Classifier - Mail SPAMs
Shopping cart recommendations
3, ML in Smart Assistants
ML in Robot AIs
ML in Games

Answer 5

A

Data is collection of vectors
Metadata is information on the data

Answer 6

A

A model is the mathematical simplification of reality

Answer 7

A

Predictive Model
a. Regression Model: predict a real valued continuous
output
b. Classification model: predict non-real valued discrete
output
Probabilistic Model - scores different likelihoods of reality

Answer 8

A

Learning algorithms convert data into models.
choose from a collection of models with same structure but different parameters. use the one with the optimal parameter value to infer.

Answer 9

A

In simplified terms, supervised learning is curve-fitting. Goal is given {(x1,y1),(x2,y2),…. (x^n,y^n) are vectors , find a model f such that f(x^i) is close to y^i

There will be training data which will be used to create the model and there will be test data which will be used to evaluate and adjust performance.
Supervised learning tasks
1. Regression
2. Classification

Answer 10

A

x sub script 3 superscript 8

Answer 11

A

Indicator functions are boolean functions which take a predicate as the input and gives True or False

e.g. 1(2 is even) =1 , 1(2 is odd)=0 , where 1(predicate) is the indicator function.

Answer 12

A

loss = (1/n) * Summation (f(x^i) - y^i)^2

Squared sum of error (SSE)

Loss denotes the difference between the values in prediction and reality as a single valued metric.

Answer 13

A

f(x) = w^Tx + b , where W^T is the weight vector and x is the the data matrix.

Answer 14

A

Output is not real valued like regression but are of form ‘Yes ‘ or ‘No’ , ‘True’ or ‘False’, +1 or -1

Answer 15

A

Loss here is the fraction of misclassified instances.
loss = (1/n) Summation (i=1 to n) 1(f(x^i) not equal to y^i)

Answer 16

Study These Flashcards

A

f(x) = sign (wTx +b)
aka linear separator model

Answer 17

Study These Flashcards

A

Learning algo uses training data to get model f but evaluating the learned model must not be done on the training data itself. Use test data that is not in the training data for model evaluation.

Answer 18

Study These Flashcards

A

Learning algo just find the “best model” in the collection of models given by the human. So, how to find the right collection of models?
This is called model selection, and it is done by using another subset of data called validation data that is distinct from train and test data.

Answer 19

Study These Flashcards

A

Using validation data, humans form a collection of possible models and feed it to the learning algorithms , which, with the use of training data, will find the optimal parameter value for which a model performs the best and then use the model , by using the test data and evaluating the results and arriving at one final model.

Answer 20

Study These Flashcards

A

Informally, unsupervised learning is “understanding data”. It is to build models that compress, explain and group data. Unlike supervised learning, unsupervised learning data(training) does not give the output values to train.

Answer 21

Study These Flashcards

A

Dimensionality Reduction: Compression and simplification.
Density estimation

Answer 22

Study These Flashcards

A

Encoder f: R^d –> R^d’
Decoder g: R^d’–> R^d
Goal : g(f(x^i))~ x^i
Loss : 1/n Summation (i=1 to n) norm( g(f(x^i)-x^i))^2

In reality, out of all possible encoder decoders, dimensionality reduction algorithms will find the best encoder and decoder.

Answer 23

Study These Flashcards

A

Gives probabilistic model which scores all data and then filters on a threshold.
E.g. Calculates a probability score of a given tweet and predicts if the tweet was made by Mr. Deepak Chopra. For this exercise, the total number of legal tweets are ( 26 characters ^128 characters allowed per tweet)

Answer 24

Study These Flashcards

A

Probability mapping P : R^d –> R+ that ‘sums ‘ to one
Goal: P(x) is large, if x belongs to Data sample provided, and low otherwise.
Loss= 1/n Summation (i=1 to n) -log (P(x^i) aka negative log likelihood

Week1 Flashcards

(25 cards)