Fundamentals Flashcards
Keeps fundamentals of machine learning on your tips.
Training Set
The examples that the system uses to learn are called the training set. Each training example is called a training instance (or sample).
Why Machine Learning?
- The ML programs are much shorter, easier to maintain, and most likely more accurate.
- The ML program learns automatically.
- ML solves problems that are either too complex for traditional approaches or have no known algorithm.
- ML can help humans learn.
Machine Learning is great for?
- Problems for which existing solutions require a lot of hand-tuning or long lists of rules: one Machine Learning algorithm can often simplify code and perform better.
- Complex problems for which there is no good solution at all using a traditional approach: the best Machine Learning techniques can find a solution.
- Fluctuating environments: a Machine Learning system can adapt to new data.
- Getting insights or finding patterns into complex problems and large amounts of data.
Types of Machine Learning Systems?
- Whether or not they are trained with human supervision (supervised, unsupervised, semi-supervised, and Reinforcement Learning)
- Whether or not they can learn incrementally on the fly (online versus batch learning)
- Whether they work by simply comparing new data points to known data points, or instead detect patterns in the training data and build a predictive model, much like scientists do (instance-based versus model-based learning)
What are typical supervised learning tasks?
- Classification (to group into categories)
- Regression (to predict a target value)
- Anomaly Detection (to detect the outliers)
- Association rule learning (to discover interesting relationships between attributes)
What is the difference between attribute and feature?
An attribute is a data type and feature means an attribute and its value.
Which are the most important supervised learning algorithms?
- k-Nearest Neighbors
- Linear Regression
- Logistic Regression
- Support Vector Machines (SVMs)
- Decision Trees and Random Forests
- Neural networks
Which are the most important unsupervised learning algorithms?
Clustering (groups similar data) - k-Means - Hierarchical Cluster Analysis (HCA) - Expectation Maximization Visualization (plots 2D or 3D representations) and dimensionality reduction (data simplification) - Principal Component Analysis (PCA) - Kernel PCA - Locally-Linear Embedding (LLE) - t-distributed Stochastic Neighbor Embedding (t-SNE) Association rule learning - Apriori - Eclat
What is feature extraction?
Merging of similar features into one without sacrificing accuracy.
When should you use online learning algorithms?
- When you need a reactive system e.g. stock price predictor.
- When autonomous learning is needed e.g. rover on Mars.
- When resources are limited e.g. smartphone app.
What is the learning rate?
One important parameter of online learning systems is how fast they should adapt to changing data: this is called the learning rate.
What is instance-based learning?
The system learns the examples by heart, then generalizes to new cases using a similarity measure.
What is model-based learning?
It’s the way to generalize from a set of examples is to build a model of these examples, then use that model to make predictions.
How do you define the performance measure of your algorithms?
You can either define a utility function (or fitness function) that measures how good your model is, or you can define a cost function that measures how bad it is.
What is the lifecycle of a typical ML project?
- You study the data.
- You select a model.
- You train it on the training data (i.e., the learning algorithm searched for the model parameter values that minimize a cost function).
- Finally, you apply the model to make predictions on new cases (this is called inference), hoping that this model will generalize well.