03 Random variables, probability and likelihood* Flashcards
Generalisation and overfitting
there is a trade off between
- generalisation (predictive ability)
- overfitting (minimising loss)
fitting model perfectly to training data likely lead to poor predictions as noise are always present
what is noise
errors or random events that cannot be predicted
t = w0 + wx + n
where noise is a continuous event, need to choose probability for noise that is normally distributed
- using optimised weights, generate noise using gaussian
- mean affects the intercept of the line
- sigma affects the spread of the noise
discrete and continuous random variables
discrete events: dice roll, coin flip
continuous events: winning time in sprint
discrete random events
can be calculated using probability
eg. chance of dice roll = 1/6, coin flip = 1/2
joint probability => P(X=x, Y=y)
continuous random events
cannot be measured using probability, use density functions instead that calculates area under curve
joint density => p(x0, x1)
independent variable
P(X=x, Y=y) = P(X=x), P(Y=y)
dice rolls, no matter how many times i roll a dice, the outcome of each roll is not affected by the previous once
dependent variable
eg. X = I’m playing tennis (1=yes, 0 =no)
Y = It is raining (1=yes, 0=no)
outcome of X depends on Y (if its raining, i’m not playing tennis for sure)
P(X=1|Y=1) = 0
likelihood
- likelihood of a particular predicted will happen to the random var
- different from probability
- the higher the likelihood, the better the model
- used to evaluate density function
sigmaSqur = 1/N .(t-xw)T . (t-xw)