10. Machine Learning Flashcards

Question 1

Q

Machine Learning and Artificial Intelligence (Definition Machine Learning; ML, DL and AI)

Answer

A

Machine Learning means to use learning algorithms on (big) data to make accurate predictions and detect previously unknown patterns.
Machine Learning, Deep Learning and AI have different meanings. Not every system categorized as an intelligent system uses machine learning.

Question 2

Q

Types of Machine Learning Algorithms (3)

Answer

A

Supervised Learning
Unsupervised Learning
Reinforced Learning

Question 3

Q

ML-Algorithm: Supervised Learning (Definition, Examples)

Answer

A

Supervised Learning = having a lot of data, where each dataset has a specific label (= target class/Target value)
Examples: Email Spam Detection, Handwriting Recognition, Medical Diagnosis,..

Question 4

Q

ML-Algorithm: Unsupervised Learning (Definition, Examples)

Answer

A

Unsupervised Learning = having a dataset but no labeled classes -> with ML we want to learn about the raw data.

Examples: Clustering, Recommender Systems, Risk Factor Analysis

Question 5

Q

ML-Algorithm: Reinforced Learning

Answer

A

Reinforced Learning = the algorithm tries something random and waits for feedback -> if feedback is positive, the algorithm learns that it was doing the right thing
=> incremental learning by iteratively trying different actions and processing the feedback.

Examples: Games, Traffic Light Control, Robotics

Question 6

Q

Supervised Learning: Regression

Answer

A

Make quantitative (real valued) predictions on the basis of a (vector of) features or attributes
=> reveals causal relationship between the independent variables (input) and dependent variables (output)

Example: Bundesliga. Estimating the position of a team based on the scored goals.

Question 7

Q

Supervised Learning: Classification (Goal)

Answer

A

Goal: Use training data to build a classification model that predicts the correct category (label) for previously unknown data with high accuracy.

Question 8

Q

Supervised Learning: Decision Trees (Method, How to build one)

Answer

A

Create a model that predicts the value of an output target variable at the leaf nodes of the tree, based on several input variables at the root and interior nodes of that tree.

How to build a decision tree:

Trees are built from the root to the leaves. In each iteration, one further attribute is defined. The decision tree learning algorithm selects the attribute which provides the most information gain (IG)
IG is a measure on how well an attribute will split up the remaining data into disjunct groups
The algorithm prefers an attribute with an high IG over attributes with lower IGs
the IG for each attribute must be computed

Question 9

Q

Supervised Learning: k-Nearest Neighbor

Answer

A

Goal: Use labeled data to predict the class of a previously unknown instance based on its similarity to other data points. 
The idea is to identify the closest neighbor(s) and classify the new instance similarity. If the closest neighbors have different classes, perform a majority vote. 
This approach does not require a training phase and no model is built.

Question 10

Q

Training and Test Data (Rule, Trade-off)

Answer

A

In supervised learning algorithms, we have labeled data that we can use to train our models -> however, we need some data to test our trained model!
-> Rule: Never train on test data
Instead, we split up our data set into training data and test data and keep them separate:
- the larger we choose the training set, the better our
model becomes
- the larger we choose our testing set, the more
confidence we can have in our results

Question 11

Q

The Results of Classification: The Confusion Matrix

Answer

A

Confusion Matrix = represents relation between predicted outcome and the actual outcome
= a table layout that allows visualization of the performance of an algorithm (supervised learning). Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class -> shows us how many times our algorithm was correct/wrong

Question 12

Q

True vs. False and Positive vs. Negative (Definitions (4))

Answer

A

From the confusion matrix of the two or more classes, we can derive the the confusion matrix for each individual class.

Definitions:
- Positive: Instance is the predicted object
- Negative: Instance is not the predicted object
- True: Prediction is correct
- False: Prediction is wrong
Thus, the values in a Confusion Matrix can be TP, TN, FP or FN.

Question 13

Q

Evaluating Classification Models: Accuracy

Answer

A

Accuracy = metric for evaluating models. Loosely speaking, it’s the the fraction of predictions our model got right: (TP+TN) / (TP+TN+FP+FN)

Question 14

Q

Evaluating Classification Models: Precision and Recall

Answer

A

Precision = What proportion of positive identifications was actually correct? TP / (TP+FP)

Recall = What proportion of actual positives was identified correctly? TP / (TP+FN)

Question 15

Q

Accuracy vs. Precision vs. Recall (When do you need what metric?)

Answer

A

A high accuracy can be first evidence but should be treated with caution.
A high precision is important when false alarms are costly.
A high recall is important when it’s vital to detect every single positive instance.

Question 16

Q

Unsupervised Learning: Association Analysis (Goal and Idea)

Answer

Study These Flashcards

A

Goal: Find frequent patterns, associations or causal structures that exist in collection of objects.

Idea: Algorithms find rules in unlabeled data sets such that X->Y

Question 17

Q

Unsupervised Learning: Principle Component Analysis (Goal and Idea)

Answer

Study These Flashcards

A

Goal: Identify patterns in high dimensional data and reduce the number of dimensions without much loss of information.

Idea: Multiple features in one data set are correlated and have a similar impact on the variance of the data.

Question 18

Q

Unsupervised Learning: Clustering (Goal and Method)

Answer

Study These Flashcards

A

Goal: Dividing data into meaningful or useful groups (=clusters).

Method: Dividing the population or data points into a number of groups sich that data points in the same groups are similar to other points in the same group and dissimilar to data points in other groups

Question 19

Q

Reinforcement Learning: Games (Goal and Method)

Answer

Study These Flashcards

A

Goal: Learn a behavior without the need for labeled data

Method: The algorithm does only learn by the feedback it receives as a result of its actions -> no correct input/output pairs presented to the machine. Instead, good outcomes are rewarded, and bad ones are punished.

10. Machine Learning Flashcards

(19 cards)