ML Flashcards

1
Q

What are features and labels?

A

Features: Values that usefully characterize the things we wish to classify.
Labels: The one feature that is the outcome of the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name the three types of Machine Learning (plus a very short explanation).

A

-> Supervised machine learning algorithms
° the possible outcomes are already known and the training data is labeled with the correct answers
-> Unsupervised machine learning algorithms
° the training data will have no correct answer or no specific outcome or label. Algorithms help to discover interesting patterns in data.
->Reinforcement machine learning algorithms
° the machine is exposed to an environment where it trains itself continually using the trial and error method to make accurate decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the seven steps of Machine Learning (plus a very short explanation).

A
  1. Data collection
    collecting training data
  2. Data preparation
    de-duplicated and errors need to be removed
  3. Choosing a model
    the model needs to meet the business goal
  4. Training
    use your training data and incrementally improve the predictions of the model
  5. Evaluation
    testing the machine learning against an unused control dataset
  6. Parameter tuning
    test the originally set parameters to improve the AI
  7. Prediction
    answer questions using predictions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two kinds of problems that can be solved using supervised learning (plus a very short explanation)?

A

-> Classification
we have categorized output such as “black”, “teaching”, “non-teaching”
-> Regression
we have real value output such as “distance”, “kilogram”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give al ML techniques and there kind of problems

A

-> supervised learning
- classification
- regression

-> unsupervised learning
- Clustering (simular data)
- Anomaly dectection (unusual detection)
- Association (interesting relations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Using linear regression, the system estimates a regression function with the equation 𝑓(𝑥) = 𝑏 + m𝑥. Can you explain the values of b and m?

A

b
the point where the estimated regression line crosses the 𝑦 axis

m
determines the slope of the estimated regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Can you explain the terms bias and variance? Is bias, resp. variance low or high for the straight line / squiggly line?

A

bias: it describes how well the model matches the training data set
variance: the changes in the model when using different portions of the training data set

straight line: bias high and variance low
squiggly line: bias low and variance high

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Name the three types of recommender systems (plus a short explanation).

A
  1. Popularity based
    houd enkel rekening met de populariteit in het algemeen om een aanbeveling te doen
  2. Content based
    it analyses the content and finds similar content
  3. Collaborative filtering
    find similar users and recommend something the other user liked/watched
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Name three companies that use recommender systems intensively. What are the items they recommend?

A

Netflix -> movies
Amazon -> products to buy eg. books
facebook -> people you might know: friends

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Name the three challenges of collaborative filtering. Can you explain them?

A

Data sparsity
-> Users in general rate only a limited number of items

Cold start
-> Difficulty in recommendations in new users or new items

Scalability
-> Increase in number of users or items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain the following term:
entropy

A

een maat voor de waarschijnlijkheid van een bepaalde verdeling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain the following term:
information gain

A

the amount of information gained about a random variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain the following term:
leaf node

A

end node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Explain the following term:
decision node

A

node waarin een beslissing van verdeling gemaakt wordt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain the following term:
root node

A

Eerste node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Name one advantage and one disadvantage of decision trees.

A

+ easy to understand
- overfitting is quite common

17
Q

What are the four steps in the Random Forest algorithm?

A
  1. Select random samples from a given dataset (= bootstrapped datasets).
  2. Construct a decision tree for each sample (using only a random subset of variables) and get a prediction result from each decision tree.
  3. Perform a vote for each predicted result.
  4. Select the prediction result with the most votes as the final prediction.
18
Q

Explain, using the dog analogy, how reinforcement learning works.

A
  • een hond is een agent in een omgeving. De omgeving kan je huis zijn.
  • De situaties die de hond tegenkomt is een state. vb: een hond staat en er word een bepaald commando op een bapaalde toon gegeven in de living.
  • De agent reageerd door een actie uit te voeren om van de ene state over te gaan naar een andere state, de hond gaat bijvoorbeeld van staan naar zitten.
  • Na de overgang kan de agent een beloning of een straf terugkrijgen. De hond krijgt een traktatie of een nee
  • Het beleid is de strategie van het kiezen van een actie gegeven een state in de verwachting van betere uitkomsten.
19
Q

Why are there 500 different states in the taxi problem?

A

5×5×5×4 = 500
-5x5 grid
-5 possible locations for our passenger
-4 possible locations where we can drop of our passenger

20
Q

What rewards and/or penalties are involved in the taxi problem?

A
  • high positive reward for a successful dropoff
  • penalized if it tries to drop off a passenger in wrong locations
  • slight negative reward for not making it to the destination after every time-step
21
Q

What are the six different actions that can be taken in the taxi problem?

A
  1. south
  2. north
  3. east
  4. west
  5. pickup
  6. dropoff