1 Flashcards

1
Q

Define Machine Learning

A

A system that learns from a set of data to perform a given task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are 4 problems suited to machine learning

A

When the problem is complex problems and has no specific solution
When long lists of hand tuned rules are required
When the environment fluctuates over time
To help humans learn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name 2 common supervised tasks

A

Classification – Putting instances into different classes

Regression – Predicting a Target Numeric Value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name 4 common unsupervised tasks

A

Clustering – Categorising instances into approximate groups
Visualization – Producing a 2D or 3D representation of data
Dimensionality Reduction – Simplifying data without losing information e.g. by feature extraction (combining features)
Association – Discover relations between attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What type of ML algorithm would you use to allow the robot to walk in unknown locations

A

Reinforcement Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What type of algorithm would you use to segment customers into multiple groups

A

Unsupervised Learning clustering algorithm if you don’t know how to segment customers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Is spam detection supervised or unsupervised

A

Supervised as data is labbled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Main challenges of machine learning

A

Lack of Data
Poor data quality
Non-representative data
Uninformative features
Excessively simple models underfitting data
Excessively complex data that overfits the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Whats a test set and why would you use it

A

Data is often split into 2 sets:
One is for training the model (usually 80%)
The other is for testing the model and estimating the generalization error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What algorithm relies on similarity

A

Instance based learning systems learns raw training data then uses a similarity measure on new instances to make predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do model based algorithms search for and what is the most common strategy they use to succeed and how do they make predictions

A

Optimal value for parameters so the model generalises well

Trained by minimising a cost function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

If your model performs well on training data but poorly on test data what is happening and name 3 solutions

A

More data
Simplifying the model
Reducing noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Whats a validation set

A

A set that compared models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What can go wrong if you use the test set for hyper parameter tuning

A

Overfitting and poor generalisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is cross validation and why is it better then using a validation set

A

Allows comparing models without need for a separate validation set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is reinforcement learning

A

A learning agent observes an environment then selects and perfroms actions and gets reward or penalties in return to learn what the best strategy (policy) is

17
Q

What is semi supervised learning

A

When some data is labelled but the majority isn’t

An example includes photo services that recognised photos are of the same people