Lec 4 | Learning Flashcards
It provides a computer with data, rather than explicit instructions. Using these data, the computer learns to recognize patterns and becomes able to execute tasks on its own.
Machine Learning
It is a task where a computer learns a function that maps inputs to outputs based on a dataset of input-output pairs.
Supervised Learning
This is a supervised learning task where the function maps an input to a discrete output. In other terms, it is the task learning a function mapping an input point to a discrete category.
Classification
- An algorithm, given an input,chooses the class of the nearest data point to that input.
- One way of solving a task by assigning the variable in question the value of the closest observation
Nearest-Neighbor Classification
How do you get around the limitations of nearest-neighbor classification?
One way to get around the limitations of nearest-neighbor classification is by using k-nearest-neighbors classification.
An algorithm that, given an input, chooses the most common class out of the k nearest data points to that input
k-nearest-neighbor classification
What is a drawback of using k-nearest-neighbor classification?
A drawback is that, using a naive approach, the algorithm will have to measure the distance of every single point to the point in question, which is computationally expensive. This can be sped up by using data structures that enable finding neighbors more quickly or by pruning irrelevant observation.
Another way of going about a calssification problem is by looking at the data as a whole and trying to create a decision boundary. In two-dimensional data, we can draw a line between the two types of observations. Every additional data point will be classified based on the side of the line on which it is plotted.
Perceptron Learning
What is the drawback of Perceptron Learning? And how will we compromise?
The drawback to this approach is that data are messy, and it is rare that one can draw a line and neatly divide the classes into two observations without any mistakes. Often, we will compromise, drawing a boundary that separates the observations correctly more often than not, but still occasionally misclassifies them.
What is the perceptron learning rule?
Given data point (x, y), update each weight according to:
wi = wi + α(y - hw(x)) × xi
or
wi = wi + α(actual value - estimate) × xi
What is an important takeaway from the perceptron learning rule?
The important takeaway from this rule is that for each data point, we adjust the weights to make our function more accurate.
The details, which are not as critical to our point, are that each weight is set to be equal to itself plus some value in parentheses.
apil ba nanag ikaduja aning takeaway????
It switches from 0 to 1 once the estimated value crosses some threshold.
Threshold function
What is a downside of using a threshold function?
The problem with this type of function is that it is unable to express uncertainty.
Threshold function switches from 0 to 1 and it can only be equal to 0 or to 1
hard threshold
A logistic function can yield a real number between 0 and 1, which will express confidence in the estimate.
soft threshold
Another approach to classification is ____________________. This approach uses an additional vector (support vector) near the decision boundary to make the best decision when separating the data.
Support Vector Machine
A boundary that maximizes the distance between any of the data points. This is a type of boundary, which is as far as possible from the two groups it separates.
Maximum Margin Separator
Give a benefit of a support vector machine.
They can represent decision boundaries with more than two dimensions, as well as non-linear decision boundaries.
It is a supervised learning task of a function that maps an input point to a continuous value, some real number. This differs from classification in that classification problems map an input to discrete values (Rain or No Rain).
Regression
Functions that express how poorly our hypothesis performs.
A way to quantify the utility lost by any of the decision rules above. The less accurate the prediction, the larger the loss.
Loss functions
Loss functions
This function gains value when the prediction isn’t correct and doesn’t gain value when it is correct
0-1 Loss Function
Give function/code:
0-1 Loss Function
L(actual, predicted): 0 if actual = predicted 1 otherwise
Give Function/code
L1 Loss Function
L(actual, predicted) = | actual - predicted |