Machine Learning Flashcards
Machine Learning
Programs that can improve their performance with training data via a learning algorithm.
Learning benefit instead of deterministic outputs
- Designer cannot anticipate all possible situations.
- Designer cannot anticipate changes.
- Designer does not know the answer/ how to program the answer.
Applications
- Facial detection
- Speech recognition
- Stock prediction
- Digit recognition
Representing Instances (Feature vectors)
For example, mushrooms:
x^(1) = <bell, fibrous, …> y^(1) = edible
x^(2) = <convex, scaly, …> y^(2) = non-edible
Feature Types:
Nominal/Boolean -> no ordering
Ordinal -> possible values are totally ordered
Numeric -> weight, height etc
Hierarchical -> ordered via hierarchy
Feature Space
Way of representing features and the distributions on where the vectors fit. Can be via 3D models or databases etc
Data Preprocessing
Techniques that make it easier for data to be used for analysis by machine learning models.
Mean Normalisation
Remove mean from every data sample.
Standardisation / Normalisation
Requires all features to be on the same scale.
I.I.D
Independent and identically distributed.
We assume data collected is sampled independently from the same unknown distribution.
Supervised Learning
- Set of instances X
- Unknown target function f : X -> Y
- Set of models H = {h |h : X -> Y}
- Set of training instances (x(1), y(1)),(x(2),y(2)),…,(x(z),y(z))
- Chosen model should most accurately represent the target function X -> Y
Regression
A supervised learning technique which estimates parameters (used when Y is continuous)
Classification
A supervised learning technique which estimates classes (used when Y is discrete)
Unsupervised Learning
- Set of instances without y’s -> x(1), x(2),…,x(z).
- Goal is to discover patterns and regularities.
Clustering
An unsupervised learning technique which maps data into different clusters.