1 Flashcards
Define Machine Learning
A system that learns from a set of data to perform a given task
What are 4 problems suited to machine learning
When the problem is complex problems and has no specific solution
When long lists of hand tuned rules are required
When the environment fluctuates over time
To help humans learn
Name 2 common supervised tasks
Classification – Putting instances into different classes
Regression – Predicting a Target Numeric Value
Name 4 common unsupervised tasks
Clustering – Categorising instances into approximate groups
Visualization – Producing a 2D or 3D representation of data
Dimensionality Reduction – Simplifying data without losing information e.g. by feature extraction (combining features)
Association – Discover relations between attributes
What type of ML algorithm would you use to allow the robot to walk in unknown locations
Reinforcement Learning
What type of algorithm would you use to segment customers into multiple groups
Unsupervised Learning clustering algorithm if you don’t know how to segment customers
Is spam detection supervised or unsupervised
Supervised as data is labbled
Main challenges of machine learning
Lack of Data
Poor data quality
Non-representative data
Uninformative features
Excessively simple models underfitting data
Excessively complex data that overfits the data
Whats a test set and why would you use it
Data is often split into 2 sets:
One is for training the model (usually 80%)
The other is for testing the model and estimating the generalization error
What algorithm relies on similarity
Instance based learning systems learns raw training data then uses a similarity measure on new instances to make predictions
What do model based algorithms search for and what is the most common strategy they use to succeed and how do they make predictions
Optimal value for parameters so the model generalises well
Trained by minimising a cost function
If your model performs well on training data but poorly on test data what is happening and name 3 solutions
More data
Simplifying the model
Reducing noise
Whats a validation set
A set that compared models
What can go wrong if you use the test set for hyper parameter tuning
Overfitting and poor generalisation
What is cross validation and why is it better then using a validation set
Allows comparing models without need for a separate validation set