Lecture 1 - Supervised and Unsupervised Learning Flashcards

Question 1

Q

What is Machine Learning?

Answer

A

Machine Learning is the science (and art) of programming computers so they can learn from data. It is also known as inductive learning.

Question 2

Q

What are Machine Learning Algorithms

Answer

A

ML algorithms are tools for the automatic acquisition of knowledge.

Question 3

Q

What is Inductive Learning?

Answer

A

A form of logical inference that allows you to obtain generic conclusions about a particular set of examples.

Question 4

Q

What is Tom Mitchells definition of Machine Learning?

Answer

A

“A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.” - Refer to the email example to explain it

Question 5

Q

What is Supervised Learning?

Answer

A

Uses labeled datasets to train algorithms to predict outcomes
Examples: Classification and Regression

Question 6

Q

Define Attribute

Answer

A

Also called features, predictors, independent variables. An attribute describes a characteristic or aspect of an example. Also known as the descriptive features

Question 7

Q

Define Example

Answer

A

Also called instance, register, data point. It is a tuple of attribute values that describes an object of interest (e.g., a regular email, a patient, or a company’s customer history). The data points of the attributes (descriptive features)

Question 8

Q

Define Label

Answer

A

Also called the target or dependent variable. It is a special attribute that describes the phenomenon of interest. The predicted value based on the example

Question 9

Q

Define Dataset

Answer

A

A data set is composed of examples with respective attribute values and the associate label.

Question 10

Q

Supervised Learning Example - Credit Approval

Answer

A

REFER TO ONENOTE FOR MORE DETAILS

Question 11

Q

What are some examples of Supervised Learning Algorithms?

Answer

A

Linear Regression
Logistic Regression
k-Nearest Neighbors
Support Vector Machines (SVMs)
Decision Trees and Random Forests

Question 12

Q

What is Unsupervised Learning?

Answer

A

An Algorithm that learns from unlabelled data

Question 13

Q

What is Clustering in Unsupervised Learning?

Answer

A

An unsupervised machine learning technique designed to group unlabelled examples based on their similarity to each other

Question 14

Q

What is Anomaly Detection in Unsupervised Learning?

Answer

A

An unsupervised ML technique being the process of identifying data points, events, or observations that significantly deviate from the expected pattern within a dataset

Question 15

Q

What are some examples of Unsupervised Learning Algorithms?

Answer

A

Clustering: k-means clustering, Density-based spatial clustering of applications with noise (DBSCAN), Hierarchical Cluster Analysis (HCA)

Visualization and dimensionality reduction: Principal Component Analysis (PCA), Locally-Linear Embedding (LLE), t-distributed Stochastic Neighbor Embedding (t-SNE)
==
Association rule learning: Apriori algorithm, Eclat algorithm

Question 16

Q

What is Instance-Based Learning?

Answer

A

Instance-based Learning:
- Memorises the training data and uses it directly to make predictions
- Compares new examples to stored instances using similarity measure
Refer to the example on the slides

Question 17

Q

What is Model-Based Learning?

Answer

A

Model-based Learning:
- Builds a model from the training data to make predictions
- The model learns a general rule that applies to all data, not just memorising individual instances

Question 18

Q

What are some ways to classify Machine Learning Systems?

Answer

A

Whether or not they can learn incrementally on the fly (online learning vs. batch learning)
How they are supervised during training (supervised, semi-supervised, unsupervised, reinforcement learning).
How they generalise (instance-based, model-based).

Question 19

Q

What are the main challanges of Machine Learning?

Answer

A

Insufficient quantity of training data
Non-representative or poor-quality data
Irrelevant features
Overfitting or underfitting

Question 20

Q

What is Overfitting and Underfitting?

Answer

A

Overfitting: the model fits the data perfectly
Underfitting: the model fits the data extremely loosely

Question 21

Q

How can you test and validate the performance of Machine Learning?

Answer

A

Split your data into two sets: the training set and the test set. Train your model using the training set, and you test it using the test set.
It is common to use 80% of the data for training and hold out 20% for testing.
The error rate on the test set is called the generalisation error or out-of-sample error.
NOTE: Because using the test set to pick the best hyperparameter values tends to make the model not perform well new data other than the test set, we should also need a validation set.