Chapter 1 Flashcards by Andrew J Forbes

What is Supervised Learning?

When training data is fed with labels that indicates the solutions (contains y in train)

How well did you know this?

Not at all

Perfectly

Name some Supervised Learning Algorithms.

KNN
Linear Regression
Logistic Regression
Support Vector Machines
Décision trees and Random Forests
Neural Networks (sometimes)

How well did you know this?

Not at all

Perfectly

What is Unsupervised Learning?

The training data is unlabeled (no y), the class tries to learn without a teacher

How well did you know this?

Not at all

Perfectly

Name some Unsupervised Learning Algorithms

Through clustering:
KMeans
DBSCAN
Hierarchical Cluster Analysis
Through Anomaly detection:
One class SVM
Isolation Forest
Visualization and Dimensionality Reduction:
Principal Component Analysis
Kernel PCA
Locally Linear Embedding
T-distributed Stochastic Neighbor Embedding
Association rule learning:
Apriori
Eclat

How well did you know this?

Not at all

Perfectly

What is classification?

Examples are with their class in order to classify new emails

How well did you know this?

Not at all

Perfectly

What is Regression?

Predicting a target numeric value given a set of features called predictors. Training a model requires both predictors and labels.

How well did you know this?

Not at all

Perfectly

What is a clustering algorithm?

An algorithm to detect similarities of data points based on feature combos.

How well did you know this?

Not at all

Perfectly

What is hierarchical clustering?

Subdivision of a clustering algorithm into smaller groups

How well did you know this?

Not at all

Perfectly

What is a visualization algorithm?

An algorithm that outputs a 2d or 3d representation of data that can be plotted easily.

How well did you know this?

Not at all

Perfectly

What is Dimensionality Reduction?

Simplifying data without losing too much data,trying to merge many correlated features into one.

How well did you know this?

Not at all

Perfectly

What is feature extraction?

Merging multiple features into one

How well did you know this?

Not at all

Perfectly

Should you reduce the dimensions of data before feeding it into a Supervised ML algorithm?

Yes, it will likely perform better and quicker while reducing strain on storage and processing

How well did you know this?

Not at all

Perfectly

What is Anomaly detection?

A model that takes in normal data and removes or flags any with a very different result, usually used to remove outliers.

How well did you know this?

Not at all

Perfectly

What is novelty detection?

The same as Anomaly detection but they only see normal data , no outliers

How well did you know this?

Not at all

Perfectly

What is association rule learning?

Looking through large amounts of data and discover new relations between attributes only possible with enough data.

How well did you know this?

Not at all

Perfectly

What is Machine Learning?

The science and art of programming computers so they can learn from data

How well did you know this?

Not at all

Perfectly

What is Machine Learning?

Study These Flashcards

The science and art of programming computers so they can learn from data

What is semisupervised learning?

Study These Flashcards

Algorithms that use a lot of unlabeled data to group, then a little labeled data to classify the whole collection

Describe a Deep Belief Network (DBN)

Study These Flashcards

Unsupervised components called Boltzmann Machines (RBMs) stacked on top of each other.thenwjole system is trained unsupervised and then fine tuned using supervised techniques

What is an Agent in Reinforcement learning?

Study These Flashcards

The learning system that can observe the environment, select and perform actions, and get rewards or penalties. It must then learn a policy l

What is a policy in Reinforcement learning?

Study These Flashcards

A policy defines what action the agent should choose when it is in a given situation.

What is batch learning?

Study These Flashcards

Batch learning is when a system cannot learn incrementally and must learn on all available data.

Describe the process of offline Learning?

Study These Flashcards

System is first trained on batch learning, offline and then it is launched into production without learning anymore

For predicting stock prices, which would be better and why: offline learning or online learning?

Study These Flashcards

Online learning, as it is done incrementally, stock data can be trained in small amounts to react quickly to.the change in data

What is out of core learning?

Using online learning algorithms to train systems on huge datasets that cannot fit on one machines main memory.

What is a learning rate?

A learning rate adjusts how fast a system adapts to new data. The lower the threshold, the more resilient it is to change.

What is a utility function?

A measure of how correct a model is

What is a cost function?

A measurement of how incorrect a model is

Train and predict using a linear regression model in scikit-learn

Must have: Import sklearn.linear_model X = data.drop(columns = "target") Y = data['target'] Model = sklearn.linear_model.LinearRegression() Model.fit(X,y) X_new = [[new data matching x]] Model.predict(X_new)

What is inference?

Predicting based on an algorithm

What is the issue with nonrepresentitve training data?

The data will only reflect a population that is unlikely to create an accurate generalization

What is sampling bias?

When the method of sampling is flawed and biases the data

What are some options when dealing with significant missing values?

Ignore the feature, ignore the missing values, impute, or train models with and without it.

What is feature extraction?

Feature extraction is creating the most relevant features from the total features

What is overfitting?

Overfitting is when a model becomes too biased to a training set

How to solve overfitting?

Simplifying the model, gather more training data, and reduce noise in the training data

What is regularization?

Constraining a model to make it simpler to avoid overfitting

What is under fitting?

Opposite of overfitting, the model is too simple to learn the data structure

How to solve under fitting?

Select a more powerful model, feeding better features, reducing the regularization

What is holdout validation?

When training a few models: label some of the training set as a validation set, train multiple models with hyperparameter tuning, test on the validation set, and then train the best on both the validation and training set to be used for the test set.

What is cross-validation?

The model is evaluated on several small validation sets with the average being representative of it's score

What is the No Free Lunch theorem?

If you make no assumptions of the data,you should have no preference for model

Chapter 1 Flashcards

(42 cards)