Machine Learning Flashcards
What are the different types of machine learning?
Supervised learning (Classification and regression)
Unsupervised learning (Clustering and dimensionality reduction)
Reinforcement learning
(And Semisupervised learning)
What is machine learning?
The science of programming computers to learn from data
What are some of the benefits from machine learning compared to “old school” programming?
Take fx a spam filter, with old school programming you have to write code with all the “spam”-words to make the program find these. With ML the model finds these patterns by itself, and can when presented to new data automatically find new patterns (Spam-mails will try to avoid the spam-filters to get more clicks/views)
ML is also very good to find patterns in large data sets with many different features (high complexity)
What is the difference between online and batch learning?
Online is when the model can learn “on-the-go” by being continuously fed with new data.
Batch learning is the opposite, and the model will then only learn when presented to a new batch of data.
What is the difference between a classifier and a regressor?
A classifier classifies whether a sample belongs to a specific class or not, where regressors predict a target numeric value.
Is unsupervised learning important to read up on?
hmmm, look at the slides
What is semi supervised learning?
When only a small batch of the data is labeled. A ml model then clusters the data, and if minimum one of the instances in the cluster is labeled the model can label the rest (online photo album)
What is reinforcement learning?
Is a learning system that is rewarded/penalized when actions are taken based on the given environemnt it is put into. It thereby learn the best possible strategy called a policy for all the different situations it can be presented to.
Example:
AlphaGO
Robots learning to walk
What is learning rate?
The degree of adaption to new data. High learning rate = high degree of adaption.
A high learning rate also means that the model will quickly forget old data (more sensitive to noisy data)
Why is high generalization more important that good performance measures on the training data?
Good performance measures on training data gives a good indication on how well the model is performing, but in the end what is important is that the model can perform well on new instances (high level of generalization)
What is instance based learning when talking about approaches to generalization?
When the model is using “measure of similarity”. So if a known spam mail and a new email has a lot of words in common the new mail will get predicted as a spam mail (kNN).
What is model based learning when talking about approaches to generalization?
When generalizing from a model (clustered data) build on a set of samples.
What is the difference between a fitness/utility function and a cost function?
Utility/fitness is measuring how well the model is performing, and cost function is measuring how bad the model is performing
A cost function measures the distance between the model’s prediction and the training example. The objective is then to minimize this distance.
What is sampling bias?
It is when the data is not representative for the whole population.
Do we have sampling bias?
What is overfitting? And when does it occur?
When the model is overgeneralizing to the training data, and then isn’t capable of generalizing to new data (High variance)
It happens when the model is too complex, and there is too much noise.
What is regularization?
When constraining the model, to make it simpler (and thereby reduce the risk of overfitting). Is done by using simpler model (fewer degrees of freedom, and by constraining the hyperparameters)
What is a hyperparameter?
It is a parameter of the learning algorithm, that is constant under the training, and can help creating a better model and also constrain the model from not overfitting
What is low training error and high generalization error?
Low training error = the model performs very well on the training data
High generalization error = Model is performing well on training but poorly on test (overfitting)
Why should the test set only be used in the end?
Because if you begin to test multiple times on the test set you will adapt the parameters to this, and when presented to new data the model will not perform well.
What is the difference between model parameteres and hyperparameters?
In summary, model parameters are estimated from data automatically and model hyperparameters are set manually and are used in processes to help estimate model parameters.
What is RMSE and MAE, and what are the differences?
It is performance measures. RMSE measures the standard deviation of the errors the system makes in its predictions.
RMSE is more sensitive when having multiple outliers, and therefore MAE is sometime preferred.
What is an ensemble and ensemble learning?
An aggregate of a group of predictors (Such as classifiers or regressors). A Random forest model is typically an ensemble of decision trees.
What is stochastic gradient descent?
Instead of carefully calculating the best next step using the full dataset, is stochastic gradient descent, dividing the data into mini-batches, to calculate the best step based on that. This speeds up the process significantly. Imagine gradient descent as a man who carefully calculates each step, but it takes a long ting, and stochastic gradient descend, as a semi-drunk man who stumbles a bit more, but it is way faster.
What is a perceptron (Or neuron)?
The Perceptron is one of the simplest ANN architectures. It is based on a slightly different artificial neuron called a linear threshold unit (LTU): the inputs and output are now numbers (instead of binary on/off values) and each input connection is associated with a weight. The LTU computes a weighted sum of its inputs (z = w1 x1 + w2 x2 + ⋯ + wn xn = wT · x), then applies a step function to that sum and outputs the result: hw(x) = step (z) = step (wT · x). A single LTU can be used for simple linear binary classification. It computes a linear combination of the inputs and if the result exceeds a threshold, it outputs the positive class or else outputs the negative class.
What are weights in neural networks?
Weights represent the strength of a connection betwen two units. If the weight from node 1 to node 2 has greater magnitude, it means that neuron 1 has greater influence over neuron 2. A weight brings down the importance of the input value. As a vector, the more it influences the cost function, the more weight will it be gvien.
What is a cost function?
When you define a cost function, is it a way of telling the computer if it was right or wrong. You add up the squares of the differences between each of the bad output activations, and the value that you want them to have. It is small if the model confidently classifies correctly. The average cost is a measure of how good a neural network performs. Gradient descent can also be used to find the lowest cost, by finding the steepest slope, and take a step down, and repeat until the minimum has been found.