ML Fundamentals Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is machine learning?

A

Machine Learning is the science (and art) of programming computers so they can learn from data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Supervised/Unsupervised learning - what are they?

A

Supervised - the training set you feed to the algorithm includes the desired solutions, typically seen in classification or regression

Unsupervised - Opposite of supervised, the training data is unlabeled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Supervised learning algorithms

A

k-nearest neighbors, linear regression, logistic regression, support vector machines (SVMs), Decision trees and random forests, neural networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Unsupervised learning algorithms

A

Clustering (k-means, DBSCAN, Hierarchical cluster analysis)

Anomaly and novelty detection (one-class SVM, isolation forest)

Visualization and Dimensionality reduction (principal component analysis [PCA], kernal PCA, locally linear embedding (LLE), t-distributed stochastic neighbor embedding (t-SNE))

Association ruling (Apriori, eclat)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Dimensionality reduction

A

to simplify the data without losing too much information. One way to do this would be to merge several correlated features into one. For example merging a cars age and mileage into one ‘wear and tear’ feature. Good idea to reduce the dimension you are training data on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Feature extraction

A

Form of dimensionality reduction where you merge two correlated features into one. For example merging a cars age and mileage into one ‘wear and tear’ feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Semi-supervised learning

A

Some of the data is labeled. Ex: Photos app names the people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Reinforcement learning

A

The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards, as shown in Figure 1-12). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Batch learning

A

the system is incapable of learning incrementally, it must be trained using all the available data. Done offline.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Online learning

A

You train the system incrementally by feeding it data instances sequentially, either individually or in small groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

learning rate

A

how fast learning system adapts to changing data. If you set a high learning rate, then your system will rapidly adapt to new data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Generalizing

A

Given a number of training examples, the system needs to be able to make good predictions for (generalize to) examples it has never seen before

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Instance-based learning

A

system learns examples by heart, then generalizes to new cases by using a similarity measure to compare them to the learned examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

model-based learning

A

build a model of these examples and then use the model to make predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

utility function (fitness function)

A

measures how good your model is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

cost function

A

measures how bad your model is

17
Q

Steps of ML

A
  1. Study the data 2. Select a model 3. train the data 4. apply the model to make predictions on new cases
18
Q

Challenges for ML

A
  1. Insufficient quantity of data. 2. Non-representative training data 3. Poor data quality 4. Irrelevant features 5. Overfitting the training data
19
Q

regularization

A

constraining a model to make it simpler and reduce the risk of overfitting

20
Q

hyperparameter

A

used to apply regularization to avoid overfitting.

21
Q

out of core learning

A

Online learning algorithms can also be used to train systems on huge datasets that cannot fit in one machine’s main memory . The algorithm loads part of the data, runs a training step on that data, and repeats the process until it has run on all of the data

22
Q

fitting or training

A

capturing patterns from data

23
Q

training data

A

data used to fit or train the model

24
Q

leaf

A

point of a decision tree where you make a prediction

25
Q

standard deviation

A

measures how spread out values are from the mean

26
Q

feature

A

inputs into our model later used to make predictions of the prediction target

27
Q

mean absolute error

A

error = |actual - predicted|

mean absolute error is the mean of the abs value of each of the errors

28
Q

overfitting

A

model matches the training data perfectly, but matches poorly in validation on new data

29
Q

underfitting

A

model fails to capture important distinctions and patterns in the data

30
Q

random forest

A

made of many decision trees, making a prediction by averaging the predictions of each component tree