confidence interval
quantifies uncertainty surrounding mean over a large quantity
prediction interval
quantifies uncertainty surrounding prediction of single quantity
machine learning definition
study of algorithms applied to data, focussing on prediction and classification
machine learning characteristics
regression process
problems of machine learning
1) good for prediction, not explanatory
2) black box (don’t know why machine chooses variables)
3) ethical (sub-optimal outcomes like discrimination)
black swan
extreme outliers that have a disproportional effect (overfitting)
overfitting
avoiding overfitting
R squared
increases with number of variables
adjusted R squared
mean squared error
regression tree