Kursusgang 1 (Introduction) Flashcards
What is machine learning?
Machine learning is concerned with the design and
development of algorithms that allow computers to optimize a
performance criterion using examples or experiences.
What: Get the computer to learn density, discriminant or
regression functions by showing examples of inputs (and
outputs).
How: We write a “parameterized” program and let the
learning algorithm find the set of parameters that best
approximates the desired function or behavior.
What are the different types of machine learning and their applications?
Supervised learning: given inputs along with corresponding outputs (labeled data), find the ‘correct’ outputs for test inputs
* Classification: 1-of-N discrete output (pattern recognition)
* Regression: real-valued output (prediction)
Unsupervised learning: given only inputs without outputs (unlabeled data) as training, find structure in the space
* density estimation
* clustering
* dimensionality reduction
- Reinforcement learning: given inputs from the environment,
take actions that affect the environment and produce action
sequences that maximize the expected scalar reward or
punishment. This is similar to animal learning.
What is classification?
It is the act of assigning each input to one of a finite number of discrete categories. Learning a decision boundary that separates one class from the other.
What is regression?
Learning a continuous input-output mapping from a limited number of examples. Used when the desired output consists of one or more continuous variables.
What is density estimation?
Density estimation models the probability distribution p(x) of a random variable x, given a finite set of observations. It attempts to determine the probability density distribution of data within the input space, i.e. discover the unknown structure of the inputs.
What is clustering?
Discover groups of similar examples (clumps) within the data, e.g. k-means.
What is dimensionality reduction?
Project the data from a high dimensional space down to low dimensions.
How to apply parametric methods for classification?
do stuff
What is k-nearest neighbor search classification?
Compute the “distances” between the input and all the stored prototypes, instead of identity requirement. Then, choose the class that has the majority among the K nearest prototypes.
What are the challenges of k-nearest neighbor search classification?
The measure can influence the class of the input.
High computational intensity for large number of prototypes.
The curse of dimensionality and data sparsity.
What is underfitting and what can fix it?
The model is unable to capture the relationship between the input and output variables accurately, generating a high error rate on both the training set and unseen data. Underfitting occurs when a model is too simple, which can be a result of a model needing more training time, more input features, or less regularization. High bias and less variance are good indicators of underfitting.
What is overfitting and what can fix it?
Overfitting occurs when an algorithm fits too closely or even exactly to its training data, resulting in a model that can’t make accurate predictions or conclusions from any data other than the training data. Low error rates and a high variance are good indicators of overfitting.
Overfitting can be fixed by regularization, reducing complexity in the model by eliminating less relevant inputs, or including more data.
What is L2-regularization and what does it do? Why does it help with overfitting?
Regularization is a set of methods for reducing overfitting in machine learning models. Typically, regularization trades a marginal decrease in training accuracy for an increase in generalizability by penalizing big coefficients.
E = \sum_{n=1}^N [y(x_n,W) - t_n]^2+ 𝜆||𝑤||^2
𝑤 are elements in W, which is a vector.
What is the function view of machine learning?
In essence, a machine learning problem is a mathematical modelling problem, i.e., coming up with a mathematic function to solve a problem, e.g., predicting a label or a value or calculating a likelihood, given a data input.
- Given an input x, we define a parametric function f_𝜃(x), where 𝜃 is the set of parameters of f, to predict the output y for a given input x.
- To learn 𝜃, we define an objective/loss function:
* Regression, (continuous numbers, e.g., mean squared error),
* Classification (discrete categories, e.g., cross-entropy)
* Density (continuous numbers, e.g., likelihood) - Training: for given training data, an optimizer minimizes the loss to find 𝜃, e.g., stochastic gradient descent (SGD)
- Inference: apply the learned function to unknown data
What is the bias-variance trade-off?
There is a trade-off between bias and variance, with very flexible models having low bias and high variance, and relatively rigid models having high bias and low variance.