Chapter 1 Flashcards
What is Supervised Learning?
When training data is fed with labels that indicates the solutions (contains y in train)
Name some Supervised Learning Algorithms.
KNN
Linear Regression
Logistic Regression
Support Vector Machines
Décision trees and Random Forests
Neural Networks (sometimes)
What is Unsupervised Learning?
The training data is unlabeled (no y), the class tries to learn without a teacher
Name some Unsupervised Learning Algorithms
Through clustering:
KMeans
DBSCAN
Hierarchical Cluster Analysis
Through Anomaly detection:
One class SVM
Isolation Forest
Visualization and Dimensionality Reduction:
Principal Component Analysis
Kernel PCA
Locally Linear Embedding
T-distributed Stochastic Neighbor Embedding
Association rule learning:
Apriori
Eclat
What is classification?
Examples are with their class in order to classify new emails
What is Regression?
Predicting a target numeric value given a set of features called predictors. Training a model requires both predictors and labels.
What is a clustering algorithm?
An algorithm to detect similarities of data points based on feature combos.
What is hierarchical clustering?
Subdivision of a clustering algorithm into smaller groups
What is a visualization algorithm?
An algorithm that outputs a 2d or 3d representation of data that can be plotted easily.
What is Dimensionality Reduction?
Simplifying data without losing too much data,trying to merge many correlated features into one.
What is feature extraction?
Merging multiple features into one
Should you reduce the dimensions of data before feeding it into a Supervised ML algorithm?
Yes, it will likely perform better and quicker while reducing strain on storage and processing
What is Anomaly detection?
A model that takes in normal data and removes or flags any with a very different result, usually used to remove outliers.
What is novelty detection?
The same as Anomaly detection but they only see normal data , no outliers
What is association rule learning?
Looking through large amounts of data and discover new relations between attributes only possible with enough data.
What is Machine Learning?
The science and art of programming computers so they can learn from data
What is Machine Learning?
The science and art of programming computers so they can learn from data
What is semisupervised learning?
Algorithms that use a lot of unlabeled data to group, then a little labeled data to classify the whole collection
Describe a Deep Belief Network (DBN)
Unsupervised components called Boltzmann Machines (RBMs) stacked on top of each other.thenwjole system is trained unsupervised and then fine tuned using supervised techniques
What is an Agent in Reinforcement learning?
The learning system that can observe the environment, select and perform actions, and get rewards or penalties. It must then learn a policy l
What is a policy in Reinforcement learning?
A policy defines what action the agent should choose when it is in a given situation.
What is batch learning?
Batch learning is when a system cannot learn incrementally and must learn on all available data.
Describe the process of offline Learning?
System is first trained on batch learning, offline and then it is launched into production without learning anymore
For predicting stock prices, which would be better and why: offline learning or online learning?
Online learning, as it is done incrementally, stock data can be trained in small amounts to react quickly to.the change in data
What is out of core learning?
Using online learning algorithms to train systems on huge datasets that cannot fit on one machines main memory.
What is a learning rate?
A learning rate adjusts how fast a system adapts to new data. The lower the threshold, the more resilient it is to change.
What is a utility function?
A measure of how correct a model is
What is a cost function?
A measurement of how incorrect a model is
Train and predict using a linear regression model in scikit-learn
Must have:
Import sklearn.linear_model
X = data.drop(columns = “target”)
Y = data[‘target’]
Model = sklearn.linear_model.LinearRegression()
Model.fit(X,y)
X_new = [[new data matching x]]
Model.predict(X_new)
What is inference?
Predicting based on an algorithm
What is the issue with nonrepresentitve training data?
The data will only reflect a population that is unlikely to create an accurate generalization
What is sampling bias?
When the method of sampling is flawed and biases the data
What are some options when dealing with significant missing values?
Ignore the feature, ignore the missing values, impute, or train models with and without it.
What is feature extraction?
Feature extraction is creating the most relevant features from the total features
What is overfitting?
Overfitting is when a model becomes too biased to a training set
How to solve overfitting?
Simplifying the model, gather more training data, and reduce noise in the training data
What is regularization?
Constraining a model to make it simpler to avoid overfitting
What is under fitting?
Opposite of overfitting, the model is too simple to learn the data structure
How to solve under fitting?
Select a more powerful model, feeding better features, reducing the regularization
What is holdout validation?
When training a few models: label some of the training set as a validation set, train multiple models with hyperparameter tuning, test on the validation set, and then train the best on both the validation and training set to be used for the test set.
What is cross-validation?
The model is evaluated on several small validation sets with the average being representative of it’s score
What is the No Free Lunch theorem?
If you make no assumptions of the data,you should have no preference for model