Lecture 1 - Supervised and Unsupervised Learning Flashcards

1
Q

What is Machine Learning?

A

Machine Learning is the science (and art) of programming computers so they can learn from data. It is also known as inductive learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are Machine Learning Algorithms

A

ML algorithms are tools for the automatic acquisition of knowledge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Inductive Learning?

A

A form of logical inference that allows you to obtain generic conclusions about a particular set of examples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Tom Mitchells definition of Machine Learning?

A

“A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.” - Refer to the email example to explain it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Supervised Learning?

A

Uses labeled datasets to train algorithms to predict outcomes
Examples: Classification and Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define Attribute

A

Also called features, predictors, independent variables. An attribute describes a characteristic or aspect of an example. Also known as the descriptive features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define Example

A

Also called instance, register, data point. It is a tuple of attribute values that describes an object of interest (e.g., a regular email, a patient, or a company’s customer history). The data points of the attributes (descriptive features)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define Label

A

Also called the target or dependent variable. It is a special attribute that describes the phenomenon of interest. The predicted value based on the example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define Dataset

A

A data set is composed of examples with respective attribute values and the associate label.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Supervised Learning Example - Credit Approval

A

REFER TO ONENOTE FOR MORE DETAILS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some examples of Supervised Learning Algorithms?

A

Linear Regression
Logistic Regression
k-Nearest Neighbors
Support Vector Machines (SVMs)
Decision Trees and Random Forests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Unsupervised Learning?

A

An Algorithm that learns from unlabelled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Clustering in Unsupervised Learning?

A

An unsupervised machine learning technique designed to group unlabelled examples based on their similarity to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Anomaly Detection in Unsupervised Learning?

A

An unsupervised ML technique being the process of identifying data points, events, or observations that significantly deviate from the expected pattern within a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some examples of Unsupervised Learning Algorithms?

A

Clustering: k-means clustering, Density-based spatial clustering of applications with noise (DBSCAN), Hierarchical Cluster Analysis (HCA)

Visualization and dimensionality reduction: Principal Component Analysis (PCA), Locally-Linear Embedding (LLE), t-distributed Stochastic Neighbor Embedding (t-SNE)
==
Association rule learning: Apriori algorithm, Eclat algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Instance-Based Learning?

A

Instance-based Learning:
- Memorises the training data and uses it directly to make predictions
- Compares new examples to stored instances using similarity measure
Refer to the example on the slides

17
Q

What is Model-Based Learning?

A

Model-based Learning:
- Builds a model from the training data to make predictions
- The model learns a general rule that applies to all data, not just memorising individual instances

18
Q

What are some ways to classify Machine Learning Systems?

A
  1. Whether or not they can learn incrementally on the fly (online learning vs. batch learning)
  2. How they are supervised during training (supervised, semi-supervised, unsupervised, reinforcement learning).
  3. How they generalise (instance-based, model-based).
19
Q

What are the main challanges of Machine Learning?

A
  • Insufficient quantity of training data
  • Non-representative or poor-quality data
  • Irrelevant features
  • Overfitting or underfitting
20
Q

What is Overfitting and Underfitting?

A

Overfitting: the model fits the data perfectly
Underfitting: the model fits the data extremely loosely

21
Q

How can you test and validate the performance of Machine Learning?

A
  • Split your data into two sets: the training set and the test set. Train your model using the training set, and you test it using the test set.
  • It is common to use 80% of the data for training and hold out 20% for testing.
  • The error rate on the test set is called the generalisation error or out-of-sample error.
    NOTE: Because using the test set to pick the best hyperparameter values tends to make the model not perform well new data other than the test set, we should also need a validation set.