Week 8 DSE Flashcards
What is machine learning?
field that develops algorithms designed to be applied to datasets, with the main focus being prediction, classification, clustering or grouping tasks
What is yi in (yi,xi)?
dependent varaible ( OR RESPONSE VARIALE)
What is Xi in (yi,xi)?
P-dimensional vector of independent variables or covariates (in ML speak: features).
For a high dimensional dataset, how does p relate to N?
P»N
number of potential varibales that we can use in the model is way larger
What does P and N stand for
P: Potential variable
N: number of observations
What is supervised learning algorithm?
uses a training dataset (i.e., the estimation sample) (yi, xi), i = 1,2,…,N to determine the
conditional prediction (or forecast) rule Yˆ( X )
When yi is continuous, it is called a ________problem; when it is categorical, it is called a _________ problem.
regression
classification
What is unsupervised learning algorithm?
uses observations xi, i = 1,2,…,N of a random P-dimensional vector X with joint density p(X) to infer some properties of p(X).
trying to infer some strucutre within dataset
What is reinforcement learning?
Algorithm gets told when the answer is wrong, but has no feedback on how to correct it
has to explore different possibilities until it works out how to get the answer right.
What is the forecast distribution
cumulative distribution (s shaped)
F(y) = P(Y ≤ y).
What is Yˆ
point forecast
best guess for the unknown value
What is the formula for forecast erorr?
difference between the actual value and the forecast
See notes
Why will there always be forecast error if Y is continuous random variable?
Is because when you are dealing with continuous variable, then the probability of getting a value is 0
P(x) =0
Integration of density at one point is 0
What is used to calculate cost of forecast erorr?
loss function L(e)
can also be written as L(Y, Y HAT)
What is a forecasst?
Predictive distribution .
Action that must be constructed given loss function and forecast distribution
What are the conditions for an appropriate loss function
i)L(0) = 0 (minimum loss is 0) (when error is 0 and get exactly correct answer)
ii) L(e) ≥ 0 for all e
iii)Nonincreasing in e for e < 0, nondecreasing in e for e > 0: L(e1) ≤ L(e2) if e2 < e1 <0; L(e1) ≤ L(e2) if e2 > e1 >0. (Never be rewarded for making an error)
What are common choices for loss funciton and what is their similarities? Are they symmetric and what does it mean?
quadratic ( L(e)=e^2 ) and absolute (L(e) =|e| )
Both are symmetric: penalize positive and negative errors of the same magnitude in the same way.
What is the difference between the 2 common loss function
Quadratic loss penalizes large errors much more severely than small errors.
What is the purpose of having asymmetric loss functions?
If you have something that is safer to overestimate than to undersestinate
What is positive erorr and negative error?
If error is positive, you undershot
If error is negative, then you overshot
What is risk
expected loss
What is the optimal forecast under quadratic loss?
mean of F(y)