1. How should the ML system process an input/example? We have experience from example. 2. An example is a collection of features that have somehow been obtained. 3. Represent the example as a vector x (which is an element of a real number), each entry x_i is a feature. 4. n denotes the input dimensionality.

Lecture #2 - ML Basics Flashcards by Rogelio Ramel

Describe the differences between AI/ML/DL in your own words.

AI:

Sims to create intelligent machines capable of performing tasks that typically require human intelligence.
Focuses on developing systems that can exhibit intelligent behaviour and adapt to different stituations.

Machine Learning:

Subset of AI; development of algo. and models that allow computers to learn and make predictions or decisions without being explicitly programmed.

Deep Learning:

Specicialised subfield of MLthat utilises artificial neural networks inspired by the structure and functioning of the human brain.
Learn hierarchial representations of data by building complex computational models called deep neural networks.

How well did you know this?

Not at all

Perfectly

Describe the concept of a rule-based system

A machine learning approach that operates based on a set of predefined rules, typically created by human experts or domain knowledge.

Used to make decisions or predictions.

Input
Hand-designed program
Output

How well did you know this?

Not at all

Perfectly

Describe the concept of a hand-designed program.

A hand-designed program refers to a software program that is explicitly created and written by human programmers, specifying the precise steps and instructions to accomplish a particular task or solve a specific problem. In a hand-designed program, every aspect of the program’s behavior and functionality is carefully crafted and coded by human experts.

How well did you know this?

Not at all

Perfectly

What’s the advantage of the rule-based system.

Reply heavily on pre-defined rules and may struggle with handling complex or amiguous situations where the rules may not cover all possible scenarios.

How well did you know this?

Not at all

Perfectly

Give an example of a rule-based system.

If the ambient temperature is larger than 31 degrees celsius then

If the relative humidity is larger than 65 degrees celsius then

Thunderstorms are likely.

How well did you know this?

Not at all

Perfectly

Give the definition of classical ML in the lecture.

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

How well did you know this?

Not at all

Perfectly

Talk about task T

How should the ML system process an input/example?

We have experience from example.

An example is a collection of features that have somehow been obtained.
Represent the example as a vector x (which is an element of a real number), each entry x_i is a feature.
n denotes the input dimensionality.

How well did you know this?

Not at all

Perfectly

When talking about the task T, explain the classification tasks.

The task of classification involved assigning input data to predefined categories or classes.

e.g. classifying emails as spam or non-spam, or classifying images as containing cat or dogs.

Specify to which class (out of k possible), x belongs to.
Real number and there is 1 to k solutions.

How well did you know this?

Not at all

Perfectly

When talking about the task T, explain the clustering tasks.

The goal is to find a suitable set of k classes that seperate input vectors belonging to the same or similar classes.

e..g quantizer codebook optimizaion, grouping words by similarity.

How well did you know this?

Not at all

Perfectly

When talking about the task T, explain the regression tasks.

Regression tasks involve predicting a continuous numerical value based on input features.

Predict a numerical value from x

e.g. estimation of position, channel estimation, house price estimation

How well did you know this?

Not at all

Perfectly

What is the target function f

The target function f which should be implemented by the ML system is unkown.

The goal of the ML system is to learn an approximation of f hat = f from Experience E

How well did you know this?

Not at all

Perfectly

Explain what the Experience E is.

ML algorithms experience a training dataset, i.e. a collection of examples

How well did you know this?

Not at all

Perfectly

Explain the difference between unsupervised and supervised learning.

Supervised learning involves training a machine learning model using labeled data. Labeled data consists of input examples (features) and their corresponding correct output labels. The goal is to learn a mapping or relationship between the input features and the output labels, enabling the model to make accurate predictions or classifications on new, unseen data.

Unsupervised learning, on the other hand, deals with unlabeled data. The objective is to discover patterns, structures, or relationships within the data without the guidance of predefined output labels. The algorithm explores the inherent structure in the data to identify meaningful patterns or groupings.

How well did you know this?

Not at all

Perfectly

Explain reinforcement learning

In reinforcement learning, the learning system is referred to as an “agent,” which interacts with an “environment.” The agent takes actions in the environment, and the environment responds by providing feedback in the form of rewards or penalties. The agent’s objective is to learn the optimal strategy or policy that maximizes the long-term cumulative reward.

How well did you know this?

Not at all

Perfectly

Explain the Performance Measure P

The performance P measures quantitatively the performance of the ML algorithm.

Classification: accuracy or error rate
Regression: how well is the function approximated? MSE

How well did you know this?

Not at all

Perfectly

What’s important with the performance P when evaluating examples.

That the examples are not used for training.

The true performance measure P is typically only known after the model has been trained and tested on a seperate validation.

It may cause overfitting. The model may end up memorising the training data rather than generalizing well to new, unseen data.

What’s the implication when defining P

Defining P to obtain the desired system behavior is often difficult.

Sometimes, we cannot measure the quantity we are interested in. An alternative quality that still achieves the design objectives need to be found.

What’s the difference between training set and testing set?

A training set is used to train the machine learning model.

training set size > test set size -> sufficient data for model to learn.
performance model on training set is not a reliable indicator to ability to generalise unseen data; already seen training set.

A testing set is only used to evaluate the performance of ML system.

Serves as an unbiased benchmark to assess how well the model performs on new, unseen data.

Give the parametric model of ML and explain the different variables within this model.

The model is written in f^(x, θ)

with input x and parameter θ

Explain the learning algorithm

The objective function to measure the performance P:

J(θ, X[test], Y[test])

J is called the loss (or reward) and it should be minimised (or maximised).

Explain a particular feature when it comes to classical machine learning?

Generating new features from existing ones is particular to classical machine learning.

Give examples of hand-crafted features.

Transformation to polar coordinates.
Transformation in higher dimensional space.

What does feature engineering require?

It requires domain knowledge and expertise and many educated guesses which features could be useful.

Give the assumptions used for capacity, overfitting and underfitting.

The examples in each dataset are independent from each other.
The training and test set are identically distributed.
The underlying distribution P_data is the data-generating distribution.

Why is the expected training error is equal to the expected test error.

This is due to the fact that both training and test data stem from from the same underlying distribution *p_data*.

Name the process that the expected test error is greater than or equal to the expected value of training error.

Sample a training set of fixed size, adapt model parameters to the training set such that training error is reduced, and then sample the test set.

Name the factors determining how well an ML algorithm will perform.

1. Make small training error 2. Make the gap between training and test error small.

Name the two factors that correspond to the two central challenges in ML in terms of fitting

1. Underfitting Underfitting occurs when the model is not able to obtain a sufficiently low error value on the training set. 2. Overfitting Overfitting occurs when the gap between the training error and the test error is large.

Define the capacity of a model.

It determines whether it is more likely to overfit or to underfit. It's the ability to fit a wide variety of functions: 1. Think of the max degree of a poly. as its capacity. 2. Models with low cap. may struggle to fit the training set. 3. Models with high cap. can overfit by *memorising* properties of the training set that do not appear on the test set.

What's the "No Free Lunch" Theorem.

Averaged over all possible data generating distributions *p_data*, every classification algorithm has the same error rate when classifying previously unobserved examples. i.e. The most sophisticated classifier has the same performance as just guessing (in average) This means that no ML algo. is universal. Need to adapt it to the real-world problem at hand.

Describe what Regularisation is.

Regularisation is a modification to a learning algorithm that is intended to reduce its generalisation error but not its training error.

What are hypermeters?

The settings that we have at our disposal to control our ML algorithm are called hyperparameters. Hyperparameters are parameters that are not learned b the ML algo; otherwise it would cause overfitting.

When having data, what's usually the split for the training and validation set.

- 80% training set. - 20% validation set.

What is the KL divergence

The Kullback-Leibler (KL) divergence, also known as relative entropy, is a measure of how one probability distribution differs from another.

What is the assumption in the Naive Bayesian Classifier?

That x_i are conditionally independent given y

How can we estimate P(y) and P(x_i|y)?

1. Using histograms 2. By fitting a distribution (e.g. Gaussian or Gaussian mixture models) and using this distribution

In practice for hte Naive Bayesian Classifier, x_i are usually not independant. How can we make it independent?

They can be made independent by pre-processing with the independnet compenent analysys or approximately independent using the (simpler) principal component analysis