ML Fundamentals Flashcards by Zach M

What is machine learning?

Machine Learning is the science (and art) of programming computers so they can learn from data.

How well did you know this?

Not at all

Perfectly

Supervised/Unsupervised learning - what are they?

Supervised - the training set you feed to the algorithm includes the desired solutions, typically seen in classification or regression

Unsupervised - Opposite of supervised, the training data is unlabeled.

How well did you know this?

Not at all

Perfectly

Supervised learning algorithms

k-nearest neighbors, linear regression, logistic regression, support vector machines (SVMs), Decision trees and random forests, neural networks

How well did you know this?

Not at all

Perfectly

Unsupervised learning algorithms

Clustering (k-means, DBSCAN, Hierarchical cluster analysis)

Anomaly and novelty detection (one-class SVM, isolation forest)

Visualization and Dimensionality reduction (principal component analysis [PCA], kernal PCA, locally linear embedding (LLE), t-distributed stochastic neighbor embedding (t-SNE))

Association ruling (Apriori, eclat)

How well did you know this?

Not at all

Perfectly

Dimensionality reduction

to simplify the data without losing too much information. One way to do this would be to merge several correlated features into one. For example merging a cars age and mileage into one ‘wear and tear’ feature. Good idea to reduce the dimension you are training data on.

How well did you know this?

Not at all

Perfectly

Feature extraction

Form of dimensionality reduction where you merge two correlated features into one. For example merging a cars age and mileage into one ‘wear and tear’ feature

How well did you know this?

Not at all

Perfectly

Semi-supervised learning

Some of the data is labeled. Ex: Photos app names the people

How well did you know this?

Not at all

Perfectly

Reinforcement learning

The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards, as shown in Figure 1-12). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation.

How well did you know this?

Not at all

Perfectly

Batch learning

the system is incapable of learning incrementally, it must be trained using all the available data. Done offline.

How well did you know this?

Not at all

Perfectly

Online learning

You train the system incrementally by feeding it data instances sequentially, either individually or in small groups

How well did you know this?

Not at all

Perfectly

learning rate

how fast learning system adapts to changing data. If you set a high learning rate, then your system will rapidly adapt to new data

How well did you know this?

Not at all

Perfectly

Generalizing

Given a number of training examples, the system needs to be able to make good predictions for (generalize to) examples it has never seen before

How well did you know this?

Not at all

Perfectly

Instance-based learning

system learns examples by heart, then generalizes to new cases by using a similarity measure to compare them to the learned examples

How well did you know this?

Not at all

Perfectly

model-based learning

build a model of these examples and then use the model to make predictions

How well did you know this?

Not at all

Perfectly

utility function (fitness function)

measures how good your model is

How well did you know this?

Not at all

Perfectly

cost function

Study These Flashcards

measures how bad your model is

Steps of ML

Study These Flashcards

Study the data 2. Select a model 3. train the data 4. apply the model to make predictions on new cases

Challenges for ML

Study These Flashcards

Insufficient quantity of data. 2. Non-representative training data 3. Poor data quality 4. Irrelevant features 5. Overfitting the training data

regularization

Study These Flashcards

constraining a model to make it simpler and reduce the risk of overfitting

hyperparameter

Study These Flashcards

used to apply regularization to avoid overfitting.

out of core learning

Study These Flashcards

Online learning algorithms can also be used to train systems on huge datasets that cannot fit in one machine’s main memory . The algorithm loads part of the data, runs a training step on that data, and repeats the process until it has run on all of the data

fitting or training

Study These Flashcards

capturing patterns from data

training data

Study These Flashcards

data used to fit or train the model

leaf

Study These Flashcards

point of a decision tree where you make a prediction

standard deviation

measures how spread out values are from the mean

feature