Machine Learning - Reading 7 Flashcards
What are target variables
this is the dependent variable and can be continuous, categorical or ordinal
What are features
these are the independent variables
what is training data set
this is the sample used to fit the model
what is a hyperparameter
this is a model input specified by the research
What is unsupervised learning
The ML program is not given labeled training data, instead, puts are provided without any conclusions about those inputs
what is deep learning
algorithms are used for complex tasks such as image recognition, natural language processing and so on
what is supervised learning
uses labeled training data to guide the ML algorithms towards superior forecasting accuracy
What is overfitting
is an issue with supervised ML that result when a large number of features are included in the data sample. Overfitting has occurred when the noise in the target variable seems to improve the model fit. Overfitting the model will decrease the accuracy of model forecasts on other data
what is bias error
This is the in-sample error resulting from model w/ a poor fit
what is variance error
This is the out-of-sample error resulting from overfitting models that do not generalize well
what is base error
These are residual errors due to random noise
What will a graph of a robust, well generalized model show?
a robust, well-generalizing model will show an improving accuracy rate as the sample size is increased, and the in-sample and out-sample error rates will converge toward a desired accuracy level
What is penalized regressions
penalized regression models reduce the problem of overfitting by imposing a penalty based on the number of features used in the model
what is LASSO
minimizes the sum of absolute value of slope coefficients
*automatically eliminates the least predictive features
what is a support vector machine
is a linear classification algorithm that separates the data into one of two possible classifiers