Week1 Flashcards

1
Q

What is the definition of machine learning?

A

ML is the study of computer algorithms that improve automatically through experience and use of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does machine learning task hierarchy is?

A

Humans make a tool design which uses data and produces a tool which will take the input and give a output.

In ML, Humans design learning algorithm which uses data to produce models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why ML?

A
  1. Programming /Human Labour Fails
    a. Scales / Speed / Cost of human labour
    b. Inability to express rules using language
    c. Don’t know the exact rules transforming input to
    output
  2. ML can succeed
    a. Have lots of example data
    b. Have some structural ideas on data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Where is ML used exclusively now?

A
  1. Classifier - Mail SPAMs
  2. Shopping cart recommendations
    3, ML in Smart Assistants
  3. ML in Robot AIs
  4. ML in Games
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Data?

A

Data is collection of vectors
Metadata is information on the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a model?

A

A model is the mathematical simplification of reality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the types of models in ML

A
  1. Predictive Model
    a. Regression Model: predict a real valued continuous
    output
    b. Classification model: predict non-real valued discrete
    output
  2. Probabilistic Model - scores different likelihoods of reality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are learning algorithms?

A

Learning algorithms convert data into models.
choose from a collection of models with same structure but different parameters. use the one with the optimal parameter value to infer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are supervised learning?

A

In simplified terms, supervised learning is curve-fitting. Goal is given {(x1,y1),(x2,y2),…. (x^n,y^n) are vectors , find a model f such that f(x^i) is close to y^i

There will be training data which will be used to create the model and there will be test data which will be used to evaluate and adjust performance.
Supervised learning tasks
1. Regression
2. Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Notation: How is the third coordinate of 8th vector denoted?

A

x sub script 3 superscript 8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Notation: what are indicator functions?

A

Indicator functions are boolean functions which take a predicate as the input and gives True or False

e.g. 1(2 is even) =1 , 1(2 is odd)=0 , where 1(predicate) is the indicator function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the loss function of a regression problem?

A

loss = (1/n) * Summation (f(x^i) - y^i)^2

Squared sum of error (SSE)

Loss denotes the difference between the values in prediction and reality as a single valued metric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the generic regression model function?

A

f(x) = w^Tx + b , where W^T is the weight vector and x is the the data matrix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a classification problem?

A

Output is not real valued like regression but are of form ‘Yes ‘ or ‘No’ , ‘True’ or ‘False’, +1 or -1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the loss function of a classification problem?

A

Loss here is the fraction of misclassified instances.
loss = (1/n) Summation (i=1 to n) 1(f(x^i) not equal to y^i)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is the generic classification model function?

A

f(x) = sign (wTx +b)
aka linear separator model

17
Q

Given we have narrowed down the learning model? How should we evaluate it?

A

Learning algo uses training data to get model f but evaluating the learned model must not be done on the training data itself. Use test data that is not in the training data for model evaluation.

18
Q

What is Model selection?

A

Learning algo just find the “best model” in the collection of models given by the human. So, how to find the right collection of models?
This is called model selection, and it is done by using another subset of data called validation data that is distinct from train and test data.

19
Q

What is the process flow of supervised learning algorithms?

A

Using validation data, humans form a collection of possible models and feed it to the learning algorithms , which, with the use of training data, will find the optimal parameter value for which a model performs the best and then use the model , by using the test data and evaluating the results and arriving at one final model.

20
Q

what is unsupervised learning?

A

Informally, unsupervised learning is “understanding data”. It is to build models that compress, explain and group data. Unlike supervised learning, unsupervised learning data(training) does not give the output values to train.

21
Q

What are some of the most common unsupervised learning tasks?

A
  1. Dimensionality Reduction: Compression and simplification.
  2. Density estimation
22
Q

What are the goal and loss functions of dimensionality reduction tasks?

A

Encoder f: R^d –> R^d’
Decoder g: R^d’–> R^d
Goal : g(f(x^i))~ x^i
Loss : 1/n Summation (i=1 to n) norm( g(f(x^i)-x^i))^2

In reality, out of all possible encoder decoders, dimensionality reduction algorithms will find the best encoder and decoder.

23
Q

What is a density estimation problem?

A

Gives probabilistic model which scores all data and then filters on a threshold.
E.g. Calculates a probability score of a given tweet and predicts if the tweet was made by Mr. Deepak Chopra. For this exercise, the total number of legal tweets are ( 26 characters ^128 characters allowed per tweet)

24
Q

What are the probability mapping, goal and loss functions of density estimation problem?

A

Probability mapping P : R^d –> R+ that ‘sums ‘ to one
Goal: P(x) is large, if x belongs to Data sample provided, and low otherwise.
Loss= 1/n Summation (i=1 to n) -log (P(x^i) aka negative log likelihood

25
Q

What are guassian mixture models?

A