ML-Midterm Flashcards
What is Machine Learning?
machine learning is about modeling based on a specific hypothesis function, ideally in the form of a density function that describes the data.
What is supervised learning and unsupervised learning ?
One has label the other one does not
What is a model ?
model is an approximation of a system
and
predict the behaviour of it.
gradient descent
gradient descent can find parameters to minimize the loss of the training data
Explain the k-fold cross validation procedure and explain what it is used for ?
Procedure: divide data in k even sets choose 1 set as validation set and k-1 as the training set, then alter the validation set and repeat the operation k times.
It is use for validate the accuracy of trained model.
What is Training set, testing set and validation set ?
Training Set: this data set is used to adjust the weights on the neural network.
Validation Set: this data set is used to minimize overfitting. Then you can use this information to turn your hyper parameter.
Testing Set: this data set is used only for testing the final solution in order to confirm the actual predictive power of the network.
What is hyper parameter?
parameter of the training algorithm, such as learning rates, momentum, or maximum number of iterations
What is information leakage or information contamination ?
you use the test data to train the model and use them to test your model.
What is support vector machine ?
Maximum margin classifier
What is hyperplane in SVM ?
hyperplane is the dividing or separating plane between the two classes
Why SVM called support vectors ?
????
What is Soft margin classifier ?
Allow some overlap of the data points.
Why use kernel ? kernel trick?
Transform the data to higher dimension so it is linearly separable.
To avoid calculate the dot product in the feature space.
What is the purpose of regularization? give example and why it helps overfitting.
Prevent overfitting
Ridge regression we add L2 regularization
to control the complexity of model. It penalizes the features with less influence on the model.
What is Batch gradient decent ?
batch gradient decent use all the data to train.
What is Stochastic gradient decent ?
Stochastic gradient decent use only single data at a time.
How does momentum helps in linear regression.
Momentum helps to overcome the local minimum.
What is Ridge regression ?
We add L2 regularization as a penalty term when updating the weights. Weight decay.
What is a random variable ?
Different every time.
Follows a specific probability density function
Different distribution, uniform, bimodel, multimodel distribution?
one peak , two peak and multiple peak.
What is Bayes theorem ?
p(x|y) = p(y|x)p(x)/p(y)
prior knowledge p(x)
evidence called the likelihood p(y|x) (observed)
posterior distribution p(x|y)
Explain what is maximum likelihood principle ?
Given a parameterize hypothesis function p(y|x; w), we will chose as parameters the values which make the data y most likely under this assumption
Likelihood function ? Maximum (log) likelihood
maximizing the log-likelihood function is equivalent to minimizing a quadratic error term
Why LMS regression is equivalent to MLE for Gaussian data ?
Because the linear dependence of the mean and constant variance.
What estimation MLE and MAP give us ?
Point estimate.
Difference between MLE and MAP ? What is it used for? Related?
MAP maximize the posteriori MLE maximize the likelihood. Eg, babes rule.
They used to obtain a point estimate of an unobserved quantity based on training data.
Related if the prior is constant(uniform prior)MAP=MLE.
What estimation MLE and MAP give us ?
Point estimate of a distribution.
What is generative model ?
Generative model is to ‘generate’ examples of the class objects
What is generative model ? What it is used to?
Generative model is to ‘generate’ examples of the class objects. Use to solve classification task.
What is discriminative model ?
A model that discriminates between classes based on the feature values.
What is K-means clustering, what is k-medios.
k-mean is to find the means of the data and k-medios is to find the central of the data.
What is Gaussian Mixture Model ?
we have k Gaussian classes, where each class is chosen randomly from a multinominal distribution
What is expectation-maximization (EM) algorithm ?
E-step find labels of the data based on a assumption of distribution.
We make assumptions of training labels from the current model (expectation step)
M-step to update the parameters of the model to maximize the probability of the observations (maximization step).
What is Causal model ?
???
What is naive babes good to ? And Why ?
Text classification, It is efficient that you do not have to calculate the probability of the words not in the dictionary.
Graphical representation of Naive Bayes ?
class
/ | \
x1 x2 x3
Assumption in Naive Bayes ?
the probabilities are conditionally independent.