ML-Midterm Flashcards by tianye wang

What is Machine Learning?

machine learning is about modeling based on a specific hypothesis function, ideally in the form of a density function that describes the data.

How well did you know this?

Not at all

Perfectly

What is supervised learning and unsupervised learning ?

One has label the other one does not

How well did you know this?

Not at all

Perfectly

What is a model ?

model is an approximation of a system

and

predict the behaviour of it.

How well did you know this?

Not at all

Perfectly

gradient descent

gradient descent can find parameters to minimize the loss of the training data

How well did you know this?

Not at all

Perfectly

Explain the k-fold cross validation procedure and explain what it is used for ?

Procedure: divide data in k even sets choose 1 set as validation set and k-1 as the training set, then alter the validation set and repeat the operation k times.

It is use for validate the accuracy of trained model.

How well did you know this?

Not at all

Perfectly

What is Training set, testing set and validation set ?

Training Set: this data set is used to adjust the weights on the neural network.

Validation Set: this data set is used to minimize overfitting. Then you can use this information to turn your hyper parameter.

Testing Set: this data set is used only for testing the final solution in order to confirm the actual predictive power of the network.

How well did you know this?

Not at all

Perfectly

What is hyper parameter?

parameter of the training algorithm, such as learning rates, momentum, or maximum number of iterations

How well did you know this?

Not at all

Perfectly

What is information leakage or information contamination ?

you use the test data to train the model and use them to test your model.

How well did you know this?

Not at all

Perfectly

What is support vector machine ?

Maximum margin classifier

How well did you know this?

Not at all

Perfectly

What is hyperplane in SVM ?

hyperplane is the dividing or separating plane between the two classes

How well did you know this?

Not at all

Perfectly

Why SVM called support vectors ?

????

How well did you know this?

Not at all

Perfectly

What is Soft margin classifier ?

Allow some overlap of the data points.

How well did you know this?

Not at all

Perfectly

Why use kernel ? kernel trick?

Transform the data to higher dimension so it is linearly separable.

To avoid calculate the dot product in the feature space.

How well did you know this?

Not at all

Perfectly

What is the purpose of regularization? give example and why it helps overfitting.

Prevent overfitting
Ridge regression we add L2 regularization
to control the complexity of model. It penalizes the features with less influence on the model.

How well did you know this?

Not at all

Perfectly

What is Batch gradient decent ?

batch gradient decent use all the data to train.

How well did you know this?

Not at all

Perfectly

What is Stochastic gradient decent ?

Study These Flashcards

Stochastic gradient decent use only single data at a time.

How does momentum helps in linear regression.

Study These Flashcards

Momentum helps to overcome the local minimum.

What is Ridge regression ?

Study These Flashcards

We add L2 regularization as a penalty term when updating the weights. Weight decay.

What is a random variable ?

Study These Flashcards

Different every time.

Follows a specific probability density function

Different distribution, uniform, bimodel, multimodel distribution?

Study These Flashcards

one peak , two peak and multiple peak.

What is Bayes theorem ?

Study These Flashcards

p(x|y) = p(y|x)p(x)/p(y)

prior knowledge p(x)

evidence called the likelihood p(y|x) (observed)

posterior distribution p(x|y)

Explain what is maximum likelihood principle ?

Study These Flashcards

Given a parameterize hypothesis function p(y|x; w), we will chose as parameters the values which make the data y most likely under this assumption

Likelihood function ? Maximum (log) likelihood

Study These Flashcards

maximizing the log-likelihood function is equivalent to minimizing a quadratic error term

Why LMS regression is equivalent to MLE for Gaussian data ?

Study These Flashcards

Because the linear dependence of the mean and constant variance.

What estimation MLE and MAP give us ?

Point estimate.

Difference between MLE and MAP ? What is it used for? Related?

MAP maximize the posteriori MLE maximize the likelihood. Eg, babes rule. They used to obtain a point estimate of an unobserved quantity based on training data. Related if the prior is constant(uniform prior)MAP=MLE.

What estimation MLE and MAP give us ?

Point estimate of a distribution.

What is generative model ?

Generative model is to ‘generate’ examples of the class objects

What is generative model ? What it is used to?

Generative model is to ‘generate’ examples of the class objects. Use to solve classification task.

What is discriminative model ?

A model that discriminates between classes based on the feature values.

What is K-means clustering, what is k-medios.

k-mean is to find the means of the data and k-medios is to find the central of the data.

What is Gaussian Mixture Model ?

we have k Gaussian classes, where each class is chosen randomly from a multinominal distribution

What is expectation-maximization (EM) algorithm ?

E-step find labels of the data based on a assumption of distribution. We make assumptions of training labels from the current model (expectation step) M-step to update the parameters of the model to maximize the probability of the observations (maximization step).

What is Causal model ?

???

What is naive babes good to ? And Why ?

Text classification, It is efficient that you do not have to calculate the probability of the words not in the dictionary.

Graphical representation of Naive Bayes ?

class / | \ x1 x2 x3

Assumption in Naive Bayes ?

the probabilities are conditionally independent.

ML-Midterm Flashcards

(37 cards)