Supervised Learning Flashcards
Why KNN is the simplest algorithm?
Because Building the model consists only of storing the training dataset. To make a prediction for a new data point, the algorithm finds the closest data points in the training dataset—its“nearest neighbors.”
How k-neighbors classification works when your model has more than one k neighbors?
When considering more than one neighbor, we use voting to assign a label. This means that for each test point, we count how many neighbors belong to class 0 and how many neighbors belong to class 1. We then assign the class that is more frequent.
Which method evaluates the score of KNN supervised algorithm? (Name the import module form skilearn and the skilearn method used)
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=3)
clf.score(X_test, y_test)
At classification what is named a decision boundary?
meaning from geometry point of view
It is the line which divides the feature space between where the algorithm assigns class 0 versus where it assigns class 1
What are the conclusions by increasing the number of neighbours regarding the decision boundary?
Increasing the number of neighbours leads to a smoother decision boundary and a less complex model
How KNN meth0od is used in regression for one neighbor?
In regression the KNN method is used as following::
If you only have one neighbour, the target value of the test point gets equal of the nearest training data point
How KNN regression is implemented in skilearn?
from sklearn.neighbors import KNeighborsRegressor
How the pefrormance of a regression is evaluated? Explain what is R2
The performance of an regression is evaluated by the following measurement. The R2 score, also known as the coefficient of determination, is a measure of goodness of a prediction for a regression model, and yields a score between 0 and 1.
Which are the basic parameters of the KNN estimators?
the number of neighbors and how you measure distance between data points. In practice,using a small number of neighbors like three or five often works well, but you should certainly adjust this parameter. By default, Euclidean distance is used, which works
well in many settings
Which datasets are non ideal for the KNN estimator?
a) too big datasets either from feature or samples
b) sparse datasets
What geometrical graph lies a linear model for singe, multiple features and so on?
Linear models for regression can be characterized as regression models for which the prediction is a line for a single feature, a plane when using two features, or a hyper‐plane in higher dimensions (that is, when using more features).
State the differences between the linear predictors and the regression estimator
The linear model in the case of predicting the line, it deprives many of the fine details which the models seem to have .It is somehow unrealistic
What is under fitting and over-fitting in terms of training set score and testing set score?
High training set score and mediocre training test score is an indication of over-fitting.
What is ridge regression?
Ridge regression is same as linear regression however another constraint is considered. In this case, we also need that all the w coefficients to be close to zero. This mean that at the model none of the features has any serious contribution.This constraint is an example of what is called regularization. Regularization means explicitly restricting a model to avoid overfitting
from sklearn.linear_model import Ridge
What is the difference between ridge and linear regressor?
The Ridge model makes a trade-off between the simplicity of the model (near-zero coefficients) and its performance on the training set.
The simplicity is defined from the alpha parameter.
The alpha parameter as it goes upwards it increases generality, meaning we have better training test performance.