Supervised Learning Flashcards
Define Machine Learning.
Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959).
What is the basic premise of learning from data?
The basic premise is to use a set of observations to uncover an underlying process when no analytic solution is available.
What are the key elements of the learning problem?
Observing training data, using a machine learning method to estimate f, where f(X)=Y
Differentiate between regression and classification in supervised learning.
Regression estimates numerical values of a target variable, while classification assigns labels to instances based on their attributes.
What is the formula for linear regression?
Yi=f(Xi)+ϵi, where ϵi is the random error term.
How is the “best fit” determined in linear regression?
By minimizing the least square errors: ∑i=1 n2.
Name three algorithms used in supervised learning besides linear regression.
Decision Trees, Random Forests, and K-Nearest Neighbors (k-NN).
What is the main advantage of Random Forests over individual decision trees?
It reduces overfitting by combining predictions from multiple trees.
Explain the k-NN algorithm’s prediction process.
Calculate distances to all data points, find the kk-nearest neighbors, and determine the majority class (classification) or average value (regression).