Chapter 1 Flashcards

1
Q

What is Supervised Learning?

A

When training data is fed with labels that indicates the solutions (contains y in train)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name some Supervised Learning Algorithms.

A

KNN
Linear Regression
Logistic Regression
Support Vector Machines
Décision trees and Random Forests
Neural Networks (sometimes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Unsupervised Learning?

A

The training data is unlabeled (no y), the class tries to learn without a teacher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name some Unsupervised Learning Algorithms

A

Through clustering:
KMeans
DBSCAN
Hierarchical Cluster Analysis
Through Anomaly detection:
One class SVM
Isolation Forest
Visualization and Dimensionality Reduction:
Principal Component Analysis
Kernel PCA
Locally Linear Embedding
T-distributed Stochastic Neighbor Embedding
Association rule learning:
Apriori
Eclat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is classification?

A

Examples are with their class in order to classify new emails

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Regression?

A

Predicting a target numeric value given a set of features called predictors. Training a model requires both predictors and labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a clustering algorithm?

A

An algorithm to detect similarities of data points based on feature combos.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is hierarchical clustering?

A

Subdivision of a clustering algorithm into smaller groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a visualization algorithm?

A

An algorithm that outputs a 2d or 3d representation of data that can be plotted easily.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Dimensionality Reduction?

A

Simplifying data without losing too much data,trying to merge many correlated features into one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is feature extraction?

A

Merging multiple features into one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Should you reduce the dimensions of data before feeding it into a Supervised ML algorithm?

A

Yes, it will likely perform better and quicker while reducing strain on storage and processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Anomaly detection?

A

A model that takes in normal data and removes or flags any with a very different result, usually used to remove outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is novelty detection?

A

The same as Anomaly detection but they only see normal data , no outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is association rule learning?

A

Looking through large amounts of data and discover new relations between attributes only possible with enough data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Machine Learning?

A

The science and art of programming computers so they can learn from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Machine Learning?

A

The science and art of programming computers so they can learn from data

18
Q

What is semisupervised learning?

A

Algorithms that use a lot of unlabeled data to group, then a little labeled data to classify the whole collection

19
Q

Describe a Deep Belief Network (DBN)

A

Unsupervised components called Boltzmann Machines (RBMs) stacked on top of each other.thenwjole system is trained unsupervised and then fine tuned using supervised techniques

20
Q

What is an Agent in Reinforcement learning?

A

The learning system that can observe the environment, select and perform actions, and get rewards or penalties. It must then learn a policy l

21
Q

What is a policy in Reinforcement learning?

A

A policy defines what action the agent should choose when it is in a given situation.

22
Q

What is batch learning?

A

Batch learning is when a system cannot learn incrementally and must learn on all available data.

23
Q

Describe the process of offline Learning?

A

System is first trained on batch learning, offline and then it is launched into production without learning anymore

24
Q

For predicting stock prices, which would be better and why: offline learning or online learning?

A

Online learning, as it is done incrementally, stock data can be trained in small amounts to react quickly to.the change in data

25
Q

What is out of core learning?

A

Using online learning algorithms to train systems on huge datasets that cannot fit on one machines main memory.

26
Q

What is a learning rate?

A

A learning rate adjusts how fast a system adapts to new data. The lower the threshold, the more resilient it is to change.

27
Q

What is a utility function?

A

A measure of how correct a model is

28
Q

What is a cost function?

A

A measurement of how incorrect a model is

29
Q

Train and predict using a linear regression model in scikit-learn

A

Must have:
Import sklearn.linear_model

X = data.drop(columns = “target”)
Y = data[‘target’]

Model = sklearn.linear_model.LinearRegression()

Model.fit(X,y)

X_new = [[new data matching x]]
Model.predict(X_new)

30
Q

What is inference?

A

Predicting based on an algorithm

31
Q

What is the issue with nonrepresentitve training data?

A

The data will only reflect a population that is unlikely to create an accurate generalization

32
Q

What is sampling bias?

A

When the method of sampling is flawed and biases the data

33
Q

What are some options when dealing with significant missing values?

A

Ignore the feature, ignore the missing values, impute, or train models with and without it.

34
Q

What is feature extraction?

A

Feature extraction is creating the most relevant features from the total features

35
Q

What is overfitting?

A

Overfitting is when a model becomes too biased to a training set

36
Q

How to solve overfitting?

A

Simplifying the model, gather more training data, and reduce noise in the training data

37
Q

What is regularization?

A

Constraining a model to make it simpler to avoid overfitting

38
Q

What is under fitting?

A

Opposite of overfitting, the model is too simple to learn the data structure

39
Q

How to solve under fitting?

A

Select a more powerful model, feeding better features, reducing the regularization

40
Q

What is holdout validation?

A

When training a few models: label some of the training set as a validation set, train multiple models with hyperparameter tuning, test on the validation set, and then train the best on both the validation and training set to be used for the test set.

41
Q

What is cross-validation?

A

The model is evaluated on several small validation sets with the average being representative of it’s score

42
Q

What is the No Free Lunch theorem?

A

If you make no assumptions of the data,you should have no preference for model