Introductory Concepts Flashcards
Machine Learning
constructs algorithms that can learn from
data.
Statistical Learning
is a branch of applied statistics that
emerged in response to machine learning,
emphasizing statistical models and assessment of
uncertainty.
Data Science
is the extraction of knowledge from data, using
ideas from mathematics, statistics, machine
learning, computer science, engineering,
Wide Data
We have too many variables; prone to overfitting.
Need to remove variables, or regularize, or both.
Tall Data
Sometimes simple models (linear) don’t suffice.
We have enough samples to fit nonlinear models with many interactions, and not too many variables.
Good automatic methods for doing this.
Linear regression
Linear regression is a simple approach to supervise
learning. It assumes that the dependence of Y on
X1;X2; : : :Xp is linear.
Mean Errror Model Training
MSETr = Avei2Tr[yi - ^ f(xi)]2
Mean Error Model Test
MSETe = Avei2Te[yi - ^ f(xi)]2
Bias-Variance Trade-off
E(y0 - ^ f(x0))2
= Var( ^ f(x0)) + [Bias( ^ f(x0))]2 + Var(e)
RSS
Residual sum square
E(y0 - ^ f(x0))2
Mean square error
Var( ^ f(x0))
Variability connected to the estimamed function model
[Bias( ^ f(x0))]2]
How far is your model from the truth