ML Fundamentals Flashcards
What is machine learning?
Machine Learning is the science (and art) of programming computers so they can learn from data.
Supervised/Unsupervised learning - what are they?
Supervised - the training set you feed to the algorithm includes the desired solutions, typically seen in classification or regression
Unsupervised - Opposite of supervised, the training data is unlabeled.
Supervised learning algorithms
k-nearest neighbors, linear regression, logistic regression, support vector machines (SVMs), Decision trees and random forests, neural networks
Unsupervised learning algorithms
Clustering (k-means, DBSCAN, Hierarchical cluster analysis)
Anomaly and novelty detection (one-class SVM, isolation forest)
Visualization and Dimensionality reduction (principal component analysis [PCA], kernal PCA, locally linear embedding (LLE), t-distributed stochastic neighbor embedding (t-SNE))
Association ruling (Apriori, eclat)
Dimensionality reduction
to simplify the data without losing too much information. One way to do this would be to merge several correlated features into one. For example merging a cars age and mileage into one ‘wear and tear’ feature. Good idea to reduce the dimension you are training data on.
Feature extraction
Form of dimensionality reduction where you merge two correlated features into one. For example merging a cars age and mileage into one ‘wear and tear’ feature
Semi-supervised learning
Some of the data is labeled. Ex: Photos app names the people
Reinforcement learning
The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards, as shown in Figure 1-12). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation.
Batch learning
the system is incapable of learning incrementally, it must be trained using all the available data. Done offline.
Online learning
You train the system incrementally by feeding it data instances sequentially, either individually or in small groups
learning rate
how fast learning system adapts to changing data. If you set a high learning rate, then your system will rapidly adapt to new data
Generalizing
Given a number of training examples, the system needs to be able to make good predictions for (generalize to) examples it has never seen before
Instance-based learning
system learns examples by heart, then generalizes to new cases by using a similarity measure to compare them to the learned examples
model-based learning
build a model of these examples and then use the model to make predictions
utility function (fitness function)
measures how good your model is