ML-Midterm Flashcards

1
Q

What is Machine Learning?

A

machine learning is about modeling based on a specific hypothesis function, ideally in the form of a density function that describes the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is supervised learning and unsupervised learning ?

A

One has label the other one does not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a model ?

A

model is an approximation of a system

and

predict the behaviour of it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

gradient descent

A

gradient descent can find parameters to minimize the loss of the training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain the k-fold cross validation procedure and explain what it is used for ?

A

Procedure: divide data in k even sets choose 1 set as validation set and k-1 as the training set, then alter the validation set and repeat the operation k times.

It is use for validate the accuracy of trained model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Training set, testing set and validation set ?

A

Training Set: this data set is used to adjust the weights on the neural network.

Validation Set: this data set is used to minimize overfitting. Then you can use this information to turn your hyper parameter.

Testing Set: this data set is used only for testing the final solution in order to confirm the actual predictive power of the network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is hyper parameter?

A

parameter of the training algorithm, such as learning rates, momentum, or maximum number of iterations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is information leakage or information contamination ?

A

you use the test data to train the model and use them to test your model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is support vector machine ?

A

Maximum margin classifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is hyperplane in SVM ?

A

hyperplane is the dividing or separating plane between the two classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why SVM called support vectors ?

A

????

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Soft margin classifier ?

A

Allow some overlap of the data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why use kernel ? kernel trick?

A

Transform the data to higher dimension so it is linearly separable.

To avoid calculate the dot product in the feature space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the purpose of regularization? give example and why it helps overfitting.

A

Prevent overfitting
Ridge regression we add L2 regularization
to control the complexity of model. It penalizes the features with less influence on the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Batch gradient decent ?

A

batch gradient decent use all the data to train.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Stochastic gradient decent ?

A

Stochastic gradient decent use only single data at a time.

17
Q

How does momentum helps in linear regression.

A

Momentum helps to overcome the local minimum.

18
Q

What is Ridge regression ?

A

We add L2 regularization as a penalty term when updating the weights. Weight decay.

19
Q

What is a random variable ?

A

Different every time.

Follows a specific probability density function

20
Q

Different distribution, uniform, bimodel, multimodel distribution?

A

one peak , two peak and multiple peak.

21
Q

What is Bayes theorem ?

A

p(x|y) = p(y|x)p(x)/p(y)

prior knowledge p(x)

evidence called the likelihood p(y|x) (observed)

posterior distribution p(x|y)

22
Q

Explain what is maximum likelihood principle ?

A

Given a parameterize hypothesis function p(y|x; w), we will chose as parameters the values which make the data y most likely under this assumption

23
Q

Likelihood function ? Maximum (log) likelihood

A

maximizing the log-likelihood function is equivalent to minimizing a quadratic error term

24
Q

Why LMS regression is equivalent to MLE for Gaussian data ?

A

Because the linear dependence of the mean and constant variance.

25
Q

What estimation MLE and MAP give us ?

A

Point estimate.

26
Q

Difference between MLE and MAP ? What is it used for? Related?

A

MAP maximize the posteriori MLE maximize the likelihood. Eg, babes rule.

They used to obtain a point estimate of an unobserved quantity based on training data.

Related if the prior is constant(uniform prior)MAP=MLE.

27
Q

What estimation MLE and MAP give us ?

A

Point estimate of a distribution.

28
Q

What is generative model ?

A

Generative model is to ‘generate’ examples of the class objects

29
Q

What is generative model ? What it is used to?

A

Generative model is to ‘generate’ examples of the class objects. Use to solve classification task.

30
Q

What is discriminative model ?

A

A model that discriminates between classes based on the feature values.

31
Q

What is K-means clustering, what is k-medios.

A

k-mean is to find the means of the data and k-medios is to find the central of the data.

32
Q

What is Gaussian Mixture Model ?

A

we have k Gaussian classes, where each class is chosen randomly from a multinominal distribution

33
Q

What is expectation-maximization (EM) algorithm ?

A

E-step find labels of the data based on a assumption of distribution.

We make assumptions of training labels from the current model (expectation step)

M-step to update the parameters of the model to maximize the probability of the observations (maximization step).

34
Q

What is Causal model ?

A

???

35
Q

What is naive babes good to ? And Why ?

A

Text classification, It is efficient that you do not have to calculate the probability of the words not in the dictionary.

36
Q

Graphical representation of Naive Bayes ?

A

class
/ | \
x1 x2 x3

37
Q

Assumption in Naive Bayes ?

A

the probabilities are conditionally independent.