Machine learning Flashcards
methods, training/testing sets, code
What are the method categories in ML?
What do they do?
Classification: determine which group a datapoint belongs in.
Clustering: split a dataset into groupings.
Regression: find a curve that describes the groups
What is the basic meaning of “learning”?
To use observations to find an underlying process.
What are the types of learning in ML?
What are their distinct features?
supervised learning:
- fits a model that related the response to the
predictors.
- used in classification algorithms such as SVMs
unsupervised learning:
- seek to understand relationships between the
observations.
- used in clustering algorithms such as k-means.
- No output / no response.
reinforcement learning:
- get a reward for accomplishing a task.
What are training sets and testing sets?
And how do they relate?
A model learns on the training set and then checks how well it learned with how it performs on the testing set.
They should always be different sets because we want to know how the model performs on unseen data.
Problems in Unsupervised Learning?
No response output to direct analysis.
Supposed to analyse clusters that (might) form. But overlapping groups could be hard to detect.
What are Unsupervised Learning’s different clustering approaches?
Hierarchical.
Centroid.
Density.
Distribution.
Clustering method examples?
K-means:
- centroid based.
- partitions data into K distinct clusters.
- centroids can be artificial.
K-medoids:
- same as K-means except centroids are actual points.
What is the K-means algorithm?
Step 1:
- randomly assign a number from 1 to K to each datapoint.
Step 2:
Iterate until cluster assignments stop changing:
- a) For each cluster: compute the cluster centroid.
- b) Assign each datapoint to the cluster whose
centroid is closest.