10. Machine Learning Flashcards

1
Q

Machine Learning and Artificial Intelligence (Definition Machine Learning; ML, DL and AI)

A

Machine Learning means to use learning algorithms on (big) data to make accurate predictions and detect previously unknown patterns.
Machine Learning, Deep Learning and AI have different meanings. Not every system categorized as an intelligent system uses machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Types of Machine Learning Algorithms (3)

A
  1. Supervised Learning
  2. Unsupervised Learning
  3. Reinforced Learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

ML-Algorithm: Supervised Learning (Definition, Examples)

A

Supervised Learning = having a lot of data, where each dataset has a specific label (= target class/Target value)
Examples: Email Spam Detection, Handwriting Recognition, Medical Diagnosis,..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ML-Algorithm: Unsupervised Learning (Definition, Examples)

A

Unsupervised Learning = having a dataset but no labeled classes -> with ML we want to learn about the raw data.

Examples: Clustering, Recommender Systems, Risk Factor Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

ML-Algorithm: Reinforced Learning

A

Reinforced Learning = the algorithm tries something random and waits for feedback -> if feedback is positive, the algorithm learns that it was doing the right thing
=> incremental learning by iteratively trying different actions and processing the feedback.

Examples: Games, Traffic Light Control, Robotics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Supervised Learning: Regression

A
Make quantitative (real valued) predictions on the basis of a (vector of) features or attributes
=> reveals causal relationship between the independent variables (input) and dependent variables (output)

Example: Bundesliga. Estimating the position of a team based on the scored goals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Supervised Learning: Classification (Goal)

A

Goal: Use training data to build a classification model that predicts the correct category (label) for previously unknown data with high accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Supervised Learning: Decision Trees (Method, How to build one)

A

Create a model that predicts the value of an output target variable at the leaf nodes of the tree, based on several input variables at the root and interior nodes of that tree.

How to build a decision tree:

  • Trees are built from the root to the leaves. In each iteration, one further attribute is defined. The decision tree learning algorithm selects the attribute which provides the most information gain (IG)
  • IG is a measure on how well an attribute will split up the remaining data into disjunct groups
  • The algorithm prefers an attribute with an high IG over attributes with lower IGs
  • the IG for each attribute must be computed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Supervised Learning: k-Nearest Neighbor

A
Goal: Use labeled data to predict the class of a previously unknown instance based on its similarity to other data points. 
The idea is to identify the closest neighbor(s) and classify the new instance similarity. If the closest neighbors have different classes, perform a majority vote. 
This approach does not require a training phase and no model is built.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Training and Test Data (Rule, Trade-off)

A

In supervised learning algorithms, we have labeled data that we can use to train our models -> however, we need some data to test our trained model!
-> Rule: Never train on test data
Instead, we split up our data set into training data and test data and keep them separate:
- the larger we choose the training set, the better our
model becomes
- the larger we choose our testing set, the more
confidence we can have in our results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The Results of Classification: The Confusion Matrix

A
Confusion Matrix = represents relation between predicted outcome and the actual outcome
= a table layout that allows visualization of the performance of an algorithm (supervised learning). Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class -> shows us how many times our algorithm was correct/wrong
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

True vs. False and Positive vs. Negative (Definitions (4))

A

From the confusion matrix of the two or more classes, we can derive the the confusion matrix for each individual class.

Definitions:
- Positive: Instance is the predicted object
- Negative: Instance is not the predicted object
- True: Prediction is correct
- False: Prediction is wrong
Thus, the values in a Confusion Matrix can be TP, TN, FP or FN.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Evaluating Classification Models: Accuracy

A

Accuracy = metric for evaluating models. Loosely speaking, it’s the the fraction of predictions our model got right: (TP+TN) / (TP+TN+FP+FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Evaluating Classification Models: Precision and Recall

A

Precision = What proportion of positive identifications was actually correct? TP / (TP+FP)

Recall = What proportion of actual positives was identified correctly? TP / (TP+FN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Accuracy vs. Precision vs. Recall (When do you need what metric?)

A

A high accuracy can be first evidence but should be treated with caution.
A high precision is important when false alarms are costly.
A high recall is important when it’s vital to detect every single positive instance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Unsupervised Learning: Association Analysis (Goal and Idea)

A

Goal: Find frequent patterns, associations or causal structures that exist in collection of objects.

Idea: Algorithms find rules in unlabeled data sets such that X->Y

17
Q

Unsupervised Learning: Principle Component Analysis (Goal and Idea)

A

Goal: Identify patterns in high dimensional data and reduce the number of dimensions without much loss of information.

Idea: Multiple features in one data set are correlated and have a similar impact on the variance of the data.

18
Q

Unsupervised Learning: Clustering (Goal and Method)

A

Goal: Dividing data into meaningful or useful groups (=clusters).

Method: Dividing the population or data points into a number of groups sich that data points in the same groups are similar to other points in the same group and dissimilar to data points in other groups

19
Q

Reinforcement Learning: Games (Goal and Method)

A

Goal: Learn a behavior without the need for labeled data

Method: The algorithm does only learn by the feedback it receives as a result of its actions -> no correct input/output pairs presented to the machine. Instead, good outcomes are rewarded, and bad ones are punished.