Lec 4 | Learning Flashcards

1
Q

It provides a computer with data, rather than explicit instructions. Using these data, the computer learns to recognize patterns and becomes able to execute tasks on its own.

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

It is a task where a computer learns a function that maps inputs to outputs based on a dataset of input-output pairs.

A

Supervised Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

This is a supervised learning task where the function maps an input to a discrete output. In other terms, it is the task learning a function mapping an input point to a discrete category.

A

Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  • An algorithm, given an input,chooses the class of the nearest data point to that input.
  • One way of solving a task by assigning the variable in question the value of the closest observation
A

Nearest-Neighbor Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you get around the limitations of nearest-neighbor classification?

A

One way to get around the limitations of nearest-neighbor classification is by using k-nearest-neighbors classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

An algorithm that, given an input, chooses the most common class out of the k nearest data points to that input

A

k-nearest-neighbor classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a drawback of using k-nearest-neighbor classification?

A

A drawback is that, using a naive approach, the algorithm will have to measure the distance of every single point to the point in question, which is computationally expensive. This can be sped up by using data structures that enable finding neighbors more quickly or by pruning irrelevant observation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Another way of going about a calssification problem is by looking at the data as a whole and trying to create a decision boundary. In two-dimensional data, we can draw a line between the two types of observations. Every additional data point will be classified based on the side of the line on which it is plotted.

A

Perceptron Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the drawback of Perceptron Learning? And how will we compromise?

A

The drawback to this approach is that data are messy, and it is rare that one can draw a line and neatly divide the classes into two observations without any mistakes. Often, we will compromise, drawing a boundary that separates the observations correctly more often than not, but still occasionally misclassifies them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the perceptron learning rule?

A

Given data point (x, y), update each weight according to:
wi = wi + α(y - hw(x)) × xi

or

wi = wi + α(actual value - estimate) × xi

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is an important takeaway from the perceptron learning rule?

A

The important takeaway from this rule is that for each data point, we adjust the weights to make our function more accurate.

The details, which are not as critical to our point, are that each weight is set to be equal to itself plus some value in parentheses.

apil ba nanag ikaduja aning takeaway????

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

It switches from 0 to 1 once the estimated value crosses some threshold.

A

Threshold function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a downside of using a threshold function?

A

The problem with this type of function is that it is unable to express uncertainty.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Threshold function switches from 0 to 1 and it can only be equal to 0 or to 1

A

hard threshold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A logistic function can yield a real number between 0 and 1, which will express confidence in the estimate.

A

soft threshold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Another approach to classification is ____________________. This approach uses an additional vector (support vector) near the decision boundary to make the best decision when separating the data.

A

Support Vector Machine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

A boundary that maximizes the distance between any of the data points. This is a type of boundary, which is as far as possible from the two groups it separates.

A

Maximum Margin Separator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Give a benefit of a support vector machine.

A

They can represent decision boundaries with more than two dimensions, as well as non-linear decision boundaries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

It is a supervised learning task of a function that maps an input point to a continuous value, some real number. This differs from classification in that classification problems map an input to discrete values (Rain or No Rain).

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Functions that express how poorly our hypothesis performs.

A way to quantify the utility lost by any of the decision rules above. The less accurate the prediction, the larger the loss.

A

Loss functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Loss functions

This function gains value when the prediction isn’t correct and doesn’t gain value when it is correct

A

0-1 Loss Function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Give function/code:

0-1 Loss Function

A
L(actual, predicted):
          0 if actual = predicted
          1 otherwise 
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Give Function/code

L1 Loss Function

A

L(actual, predicted) = | actual - predicted |

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Give Function/Code:

L2 Loss Function

A

L(actual, predicted) = (actual - predicted)^2

25
Q

What do you do if you are interested in quantifying for each prediction how much it differed from the bserved value?

A

We do this by taking either the absolute value or the squared value of the observed value minus the predicted value (i.e. how far the prediction was from the observed value).

26
Q

A model that fits too closely to a particular data set and therefore may fail to generalize to future data

A

Overfitting

27
Q

The process of penalizing hypotheses that are more complex to favor simpler, more general hypotheses

A

Regularization

28
Q

Where do we use regularization?

A

We use regularization to avoid overfitting.

29
Q

Formulae

In regularization, we estimate the cost of the hypothesis function h by adding up its loss and a measure of its complexity.

A

cost(h) = loss(h) + λcomplexity(h)

30
Q

It is a constant that we can use to modulate how strongly to penalize for complexity in our cost function. The higher ________ is, the more costly complexity is.

A

Lambda (λ)

31
Q

It splits data into a training set and a
test set, such that learning happens on the training set and is evaluated on the test set

A

holdout cross-validation

32
Q

Give a way to test if the model is overfitted.

A

Holdout Cross Validation

33
Q

What is the downside of holdout cross validation? And how do you deal with it’s downside?

A

The downside of holdout cross validation is that we don’t get to train the model on half the data, since it is used for evaluation purposes.

A way to deal with this is using k-Fold Cross-Validation.

34
Q

It splits data into k sets, and experimenting k times, using each set as a test set once, and using remaining data as training set

A

k-fold cross-validation

35
Q

As often is the case with Python, there are multiple libraries that allow us to conveniently use machine learning algorithms. One of such libraries is ________________ .

A

scikit-learn

36
Q

It is another approach to machine learning, where after each action, the agent gets feedback in the form of reward or punishment (a positive or a negative numerical value).

A

Reinforcement Learning

37
Q

What is the learning process of reinforcement learning?

A

The learning process starts by the environment providing a state to the agent. Then, the agent performs an action on the state. Based on this action, the environment will return a state and a reward to the agent, where the reward can be positive, making the behavior more likely in the future, or negative (i.e. punishment), making the behavior less likely in the future.

38
Q

Where can we use Reinforcement Learning?

A

This type of algorithm can be used to train walking robots, for example, where each step returns a positive number (reward) and each fall a negative number (punishment).

39
Q

model for decision-making, representing
states, actions, and their rewards

A

Markov Decision Process

40
Q

Reinforcement learning can be viewed as a Markov decision process, having the following properties:

A
  • set of states S
  • Set of actions Actions(S)
  • Transition model P(s’ | s, a)
  • Reward function R(s, a, s’)
41
Q

A method for learning a function Q(s, a), estimate of the value of performing action a in state s.

A

Q-learning

42
Q

Give Pseudocode

Q-Learning Overview

A
* Start with Q(s, a) = 0 for all s, a
* When we taken an action and receive a reward:
          * Estimate the value of Q(s, a) 
		      based on current reward and
		      expected future rewards
          * Update Q(s, a) to take 
		   into account old estimate 
		   as well as our new estimate

Update changeable to:
Q(s, a) ← Q(s, a) + α(new value estimate - Q(s, a))
Q(s, a) ← Q(s, a) + α((r + future reward estimate) - Q(s, a))
Q(s, a) ← Q(s, a) + α((r + maxa’ Q(s’, a’)) - Q(s, a))
Q(s, a) ← Q(s, a) + α((r + γ maxa’ Q(s’, a’)) - Q(s, a))

43
Q

An algorithm completely discounts the future estimated rewards, instead always choosing the action a in current state s that has the highest Q(s, a).

A

Greedy Decision-Making

44
Q

Explore vs. Exploit

A greedy algorithm always________________, taking the actions that are already established to bring to good outcomes. However, it will always follow the same path to the solution, never finding a better path.

A

Exploits

45
Q

Explore vs. Exploit

________________________, on the other hand, means that the algorithm may use a previously unexplored route on its way to the target, allowing it to discover more efficient solutions along the way.

A

Explore

46
Q

What to use to implement the concept of exploration and exploitation?

A

ε-greedy

ε means epsilon

47
Q

In this type of algorithm, we set ε equal to how often we want to move randomly. With probability 1-ε, the algorithm chooses the best move (exploitation). With probability ε, the algorithm chooses a random move (exploration).

A

ε (epsilon) greedy

48
Q

It allows us to approximate Q(s, a) using various other features, rather than storing one value for each state-action pair. Thus, the algorithm becomes able to recognize which moves are similar enough so that their estimated value should be similar as well, and use this heuristic in its decision making.

A

function approximation

49
Q

pseudocode(?)

ε-greedy

A
  • Set ε equal to how often we want to move randomly.
  • With probability 1 - ε, choose estimated best move.
  • With probability ε, choose a random move.
50
Q

Downside of using ε-greedy

A

This approach becomes more computationally demanding when a game has multiple states and possible actions, such as chess. It is infeasible to generate an estimated value for every possible move in every possible state.

51
Q

given input data without any additional feedback, learn patterns

A

unsupervised learning

52
Q

An unsupervised learning task that takes the input data and organizes the set of objects into groups in such a way that similar objects tend to be in the same group

A

Clustering

53
Q

What are some Clustering Applications?

A
  • Genetic research
  • Image segmentation
  • Market research
  • Medical imaging
  • Social network analysis
54
Q

An algorithm for clustering data based on repeatedly assigning points to clusters and updating those clusters’ centers.

A

k-means Clustering

55
Q

How does k-means Clustering work?

A

It maps all data points in a space, and then randomly places k cluster centers in the space (it is up to the programmer to decide how many; this is the starting state we see on the left). Each cluster center is simply a point in the space. Then, each cluster gets assigned all the points that are closest to its center than to any other center (this is the middle picture). Then, in an iterative process, the cluster center moves to the middle of all these points (the state on the right), and then points are reassigned again to the clusters whose centers are now closest to them. When, after repeating the process, each point remains in the same cluster it was before, we have reached an equilibrium and the algorithm is over, leaving us with points divided between clusters.

56
Q

CS50 QUIZ

ategorize the following: A social network’s AI uses existing tagged photos of people to identify when those people appear in new photos.

  • This is an example of supervised learning
  • This is an example of reinforcement learning
  • This is an example of unsupervised learning
  • This is not an example of machine learning
A

This is an example of supervised learning

57
Q

CS50 Quiz

Imagine a regression AI that makes the following predictions for the following 5 data points. What is the total L2 loss across all of these data points (i.e., the sum of all the individual L2 losses for each data point)?

  1. The true output is 2 and the AI predicted 4.
  2. The true output is 4 and the AI predicted 5.
  3. The true output is 4 and the AI predicted 3.
  4. The true output is 5 and the AI predicted 2.
  5. The true output is 6 and the AI predicted 5.
A

16

58
Q

CS50 Quiz

If Hypothesis 1 has a lower L1 loss and a lower L2 loss than Hypothesis 2 on a set of training data, why might Hypothesis 2 still be a preferable hypothesis?

  • Hypothesis 1 might be the result of regularization.
  • Hypothesis 1 might be the result of overfitting.
  • Hypothesis 1 might be the result of loss.
  • Hypothesis 1 might be the result of cross-validation.
  • Hypothesis 1 might be the result of regression.
A

Hypothesis 1 might be the result of overfitting

59
Q

CS50 Quiz

In the ε-greedy approach to action selection in reinforcement learning, which of the following values of ε makes the approach identical to a purely greedy approach?

  • ε = 0
  • ε = 0.25
  • ε = 0.5
  • ε = 0.75
  • ε = 1
A

ε = 0