L10 - Basics of Machine Learning Flashcards

1
Q

Machine Learning and Artificial Intelligence

A

Machine Learning means to use learning algorithms on (big) data to make accurate predictions and detect previously unknown patterns.

Machine Learning, Deep Learning & AI have all different meanings. Not every system categorized as an intelligent system uses machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Supervised Learning: Regression

A

The goal is to make quantitative (real valued) predictions on the basis of a (vector of) features or attributes. It reveals causal relationships between the independent variables (input) and dependent variables (output).

Example Bundesliga: Estimate the position of a team based on the scored goals. Findings Slide 14

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Supervised Learning: Classification

A

The goal is to use training data to build a classification model that predicts the correct category (label) for previously unknown data with a high accuracy (Framework Slide 16).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Supervised Learning: Decision Trees

A

The method is to create a model that predicts the value of an output target variable at the leaf nodes of the tree, based on several input variables at the root and interior nodes of that tree.

How to build a decision tree?
• Trees are built from the root to the leaves. In each iteration, one further attribute is defined. The decision tree learning algorithm selects the attribute which provides the most information gain (IG)
• IG is a measure on how well an attribute will split up the remaining data into disjunct groups
• The algorithm prefers an attribute with a higher IG over attributes with lower IGs
• The IG must be computed for each attribute (not relevant for exam)

Example Tennis Friends

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Supervised Learning: k-Nearest-Neighbor

A

The goal is to use labeled data to predict the class of a previously unknown instance based on its similarity to other data points.

The idea is to identify the closest neighbor(s) and classify the new instance similarity. If the closest neighbors have different classes, perform a majority vote.

This approach does not require a training phase and no model is built.

Example Slide 25

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Training and Test Data

A

In supervised learning algorithms, we have labeled data that we can use to train our model. However, we also need some data to test our trained model. But never train on test data. Split training and test data. E.g. use 900 data sets to train and 100 to test. One has to take this tradeoff.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The Results of Classification: The Confusion Matrix

A

Optimal Character Recognition (OCR) is the conversion of handwritten or printed text into digital machine-readable text. An OCR algorithm needs to detect a character, analyze the glyph features and predict the character.

The relation between the predicted outcome and the actual outcome is visualized in a Confusion Matrix (Example Slide 28)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True vs False and Positive vs. Negative

A

From the confusion matric of the two or more classes, we can derive the confusion matrix for each individual class.

Definitions:
• Positive: Instance is the predicted object
• Negative: Instance is not the predicted object
• True: Prediction is correct
• False: Prediction is

So, the values in a Confusion Matrix can be TP, FP, TN and FN.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Evaluating Classification Models: Accuracy

A

Accuracy is one metric for evaluating classification models. Informally, accuracy is the fraction of predictions our model got right. Formally: TP + TN/ TP + TN + FP + FN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Evaluating Classification Models: Precision and Recall

A

Precision: What proportion of positive identifications was actually correct?
TP/ TP + FP

Recall: What proportion of actual positives was identified correctly?
TP/ TP + FN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Accuracy vs. Precision vs. Recall

A

A high accuracy is a good first evidence but lacks. A high precision is important when false alarms are costly. A high recall is important when its vital to detect every single positive instance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Unsupervised Learning: Association Analysis

A

The goal is to find frequent patterns, associations or causal structures that exist in collections of objects.

The idea is that algorithms find rules in an unlabeled data set. (Slide 41 Example)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Unsupervised Learning: Principle Component Analysis

A

The goal is to identify patterns in high dimensional data and reduce the number of dimensions without much loss of information.

The idea is that multiple features in one dataset are correlated and have a similar impact on the variance of the data. (Slide 42 Example)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Unsupervised Learning: Clustering

A

The goal is to divide data into meaningful or useful groups (clusters)

The method is to divide the data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. (Slide 43 Example)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Reinforcement Leanings: Games

A

The goal is to learn a behavior without the need for labeled data.

The method is that the algorithm does only learn by the feedback it receives as a result of its actions. There are no correct input/ output pairs presented to the machine. Instead, good outcomes are rewarded, and bad ones are punished. (Example Slide 45)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly