Math & Statistics - Machine Learning Flashcards
What are the components of machine learning?
Task T: example - playing checkers
Performance measure P: example - percentage of games won agains opponents
Training experience E: playing practice games against itself.
What is the inductive learning hypothesis?
Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.
How to acquire a Concept learning?
Acquiring the definition of a general category by samples of positive and negative training examples of the category.
What is Concept Learning?
Concept learning can be viewed as the task of searching through a large space of hypotheses implicitly defined by the hypothesis representation. The goal of this search is to find the hypothesis that best fits the training examples.
Algorithms to find Hypothesis that fit the Concept=1
- FIND-S
2. Candidate Elimination
What is the difference between parametric and non-parametric approaches fo find f( )?
non-parametric approaches
completely avoid this danger, since essentially no assumption about the
form of f is made.
What is the difference between supervised and unsupervised statistical learning problems?
x
What is another name used for qualitative variables?
Categorical
How problems of categorical variables are classified?
We tend to refer to problems
with a quantitative response as regression problems
How problems of quantitative variables are classified?
qualitative response are often referred to as classification problems
How to measure the performance of a statistical learning method?
In order to evaluate the performance of a statistical learning method on a given data set, we need some way to measure how well its predictions actually match the observed data. That is, we need to quantify the extent
to which the predicted response value for a given observation is close to the true response value for that observation. In the regression setting, the most commonly-used measure is the mean squared error (MSE).
What is supervised statistical learning?
Building a statistical model to predict, or estimating, an output based on one or more inputs.
What is unsupervised statistical learning?
Learn relationships and structure from data, not trying to predict an output number.
What is the difference between classification problems and quantitative problems?
Quantitative problems output a number, while classification problems output a classification.
Data descoberta/invenção métodos de statistical learning?
Século 19 - Regressão Linear - Legendre e Gauss - Method of Least Squares
1940 - Logistic Regression - vários autores
1970 - Generalized Linear Models - Nelder e Wedderburn
1985 - Classification and Regression Trees - Breiman, Friedman, Olshen and Stone
1986 - Generalized Additive Models (non-linear models) - Hastie e Tibshirani
Notation used by Introduction to Statistical Learning
n - represent the number of distinct points, or observations, in our sample.
p - denote the number of variables that are available for use in making predictions.
Color fonte (red) - variables
i - index the samples of observations. (from 1 to n)
j - index the variables (from 1 to p)
X - denote a n x p matrix whose (i, j)th element is xij. Like a spreadsheet.
Xt - transpose of matrix.
Yi - denote the ith observation of the variable on which we wish to make predictions.
E
Random error term
The accuracy of Y (chapeu) as prediction for Y depends on which two quantities?
- Reducible error and
2. Irreducible error
Why is the irreducible error larger than zero?
The quantity may contain unmeasured variables that are useful in predicting Y : since we don’t measure them, f cannot use them for its prediction.
Reasons to estimate f
- Prediction
2. Inference
Modeling for inference
Descobrir o relacionamento entre as diferentes variáveis e o resultado (Y).
Modeling for prediction
Gera uma previsão (Y) baseado em input X.
What are common approaches to determine f?
Linear Regression
Logistic Regression