Intro Flashcards

Question 1

Q

What is Classification?

Answer

A

Assign a category to each item. For example, document classification may assign items with categories such as politics, business, sports, or weather while image classification may assign items with categories such as landscape, portrait, or animal. The number of categories in such tasks is often relatively small, but can be large in some difficult tasks and even unbounded as in OCR (Optical Character Recognition), text classification, or speech recognition.

Question 2

Q

What is Regression?

Answer

A

Predict a real value for each item. Examples of regression include prediction of stock values or variations of economic variables. In this problem, the penalty for an incorrect prediction depends on the magnitude of the difference between the true and predicted values, in contrast with the classification problem, where there is typically no notion of closeness between various categories.

Question 3

Q

What is Ranking?

Answer

A

Order items according to some criterion. Web search, e.g., returning web pages relevant to a search query, is the canonical ranking example. Many other similar ranking problems arise in the context of the design of information extraction or natural language processing systems.

Question 4

Q

What is Clustering?

Answer

A

Partition items into homogeneous regions. Clustering is often performed to analyze very large data sets. For example, in the context of social network analysis, clustering algorithms attempt to identify “communities” within large groups of people.

Question 5

Q

What is Dimensionality Reduction or Manifold Learning=

Answer

A

Transform an initial representation of items into a lower-dimensional representation of these items while preserving some properties of the initial representation. A common example involves preprocessing digital images in computer vision tasks.

Question 6

Q

What are Examples?

Answer

A

Items or instances of data used for learning or evaluation. In our spam problem, these examples correspond to the collection of email messages we will use for learning and testing.

Question 7

Q

What are Features?

Answer

A

The set of attributes, often represented as a vector, associated to an example. In the case of email messages, some relevant features may include the length of the message, the name of the sender, various characteristics of the header, the presence of certain keywords in the body of the message, and so on.

Question 8

Q

What are Labels?

Answer

A

Values or categories assigned to examples. In classification problems, examples are assigned specific categories, for instance, the spam and non-spam
categories in our binary classification problem. In regression, items are assigned real-valued labels.

Question 9

Q

What are Training Samples?

Answer

A

Examples used to train a learning algorithm. In our spam problem, the training sample consists of a set of email examples along with their associated labels.

Question 10

Q

What are Validation Samples?

Answer

A

Examples used to tune the parameters of a learning algorithm when working with labeled data. Learning algorithms typically have one or more free parameters, and the validation sample is used to select appropriate values for these model parameters.

Question 11

Q

What are Test Samples?

Answer

A

Examples used to evaluate the performance of a learning algorithm. The test sample is separate from the training and validation data and is not made available in the learning stage. In the spam problem, the test sample consists of a collection of email examples for which the learning algorithm must predict labels based on features. These predictions are then compared with the labels of the test sample to measure the performance of the algorithm.

Question 12

Q

What is Loss Function?

Answer

A

A function that measures the difference, or loss, between a predicted label and a true label. Denoting the set of all labels as Y and the set of possible predictions as Y’ , a loss function L is a mapping L : Y × Y’ → R + . In most
cases, Y’ = Y and the loss function is bounded, but these conditions do not always hold. Common examples of loss functions include the zero-one (or misclassification)
loss defined over {−1, +1} × {−1, +1} by L(y, y’ ) = 1 y’ != y and the squared loss defined over I × I by
L(y, y’ ) = (y’ − y)^2 , where I ⊆ R is typically a bounded
interval.

Question 13

Q

What is the Hypothesis Set?

Answer

A

A set of functions mapping features (feature vectors) to the set of labels Y. In our example, these may be a set of functions mapping email features to Y = {spam, non-spam}. More generally, hypotheses may be functions mapping features to a different set Y’ . They could be linear functions mapping email feature vectors to real numbers interpreted as scores (Y’ = R), with higher score values more indicative of spam than lower ones.

Question 14

Q

What is Supervised Learning?

Answer

A

The learner receives a set of labeled examples as training data and makes predictions for all unseen points. This is the most common scenario associated with classification, regression, and ranking problems. The spam detection problem discussed in the previous section is an instance of supervised learning.

Question 15

Q

What is Unsupervised Learning?

Answer

A

The learner exclusively receives unlabeled training data,
and makes predictions for all unseen points. Since in general no labeled example is available in that setting, it can be difficult to quantitatively evaluate the performance of a learner. Clustering and dimensionality reduction are example of unsupervised learning problems.

Question 16

Q

What is Semi-Supervised Learning?

Answer

A

The learner receives a training sample consisting of
both labeled and unlabeled data, and makes predictions for all unseen points. Semi-supervised learning is common in settings where unlabeled data is easily accessible but labels are expensive to obtain. Various types of problems arising in applications, including classification, regression, or ranking tasks, can be framed as instances of semi-supervised learning. The hope is that the distribution of unlabeled data accessible to the learner can help him achieve a better performance than in the supervised setting.

Question 17

Q

What is Transductive Inference?

Answer

A

As in the semi-supervised scenario, the learner receives
a labeled training sample along with a set of unlabeled test points. However, the objective of transductive inference is to predict labels only for these particular test
points. Transductive inference appears to be an easier task and matches the scenario encountered in a variety of modern applications.

Question 18

Q

What is On-Line Learning?

Answer

A

The online scenario involves multiple rounds and training and testing phases are intermixed. At each round, the learner receives an unlabeled training point, makes a prediction, receives the true label, and incurs a loss. The objective in the on-line setting is to minimize the cumulative loss over all rounds. Unlike the previous settings just discussed, no distributional assumption is made in on-line learning. In fact, instances and their labels may be chosen adversarially within this scenario.

Question 19

Q

What is Reinforcement Learning?

Answer

A

The training and testing phases are also intermixed in
reinforcement learning. To collect information, the learner actively interacts with the environment and in some cases affects the environment, and receives an immediate reward for each action. The object of the learner is to maximize his reward over a course of actions and iterations with the environment. However, no long-term reward feedback is provided by the environment, and the learner is faced with the exploration versus exploitation dilemma, since he must choose between exploring unknown actions to gain more information versus exploiting the information already collected.

Question 20

Q

What is Active Learning?

Answer

A

The learner adaptively or interactively collects training examples, typically by querying an oracle to request labels for new points. The goal in active learning is to achieve a performance comparable to the standard supervised learning scenario, but with fewer labeled examples. Active learning is often used in applications where labels are expensive to obtain, for example computational biology applications.