Models + Linear Regression Flashcards
what are independant variables?
the inputs, what we manipulate to see how they affect the dependant variables
what are dependant variables?
the outputs, what we measure when manipulating the independant variables
difference between continuous and discrete data?
discrete data can only take on certain values i.e. half a person? and rolling dice
continuous data can take on any value within a range i.e. height
discete data is counted
continuous data is measured
how is a predictive model used and why is it useful?
a predictive model can be used to predict how a change in the independant variables will affect the dependent ones
useful as they are prescriptive: they enable prediction of behavioural outcomes so the interface can be designed accordingly
what is the overfitting problem?
- trying to make the model match all observations (super curvy line)
- risks modelling noise
- counters occam’s razor
how are a and b calculated in a linear model?
Y = a + bX
b = XYbar - (Xbar x Ybar) / X^2bar - Xbar^2 a = Ybar - b*Xbar
describe a method of model validation
coefficient of determination (R^2)
the lower R2, the less variance is explained by the model
if the ratio is 1 then no variance is explained (good)
your ratio is how much variance is explained
R2 = 1 - SSerror/SStotal
SSerror is the sum of (y - y’)2 where y’ is the model’s estimate and y is the actual value
SStotal is the sum of (y - ybar)2
or
SSerror is how much you were off the model by and
SStotal is how much you were off the avg by
how to calculate SStotal?
SStotal is the sum of (y - ybar)2
how to calculate SSerror?
SSerror is the sum of (y - y’)2 where y’ is the model’s estimate and y is the actual value