Exam2 Flashcards
What is machine learning
Training a model on known data to predict generalized outcomes on unseen data
What are the 3 parts of the Machine learning roadmap
dimension reduction, if no –> Have responses, if yes –> predicting numeric
supervised vs unsupervised
supervised: the data is labeled
unsupervised: the data is NOT labeled
what is classification
given an input, categorizing it:
(ie duck or fruit, cat or dog)
Regression
given an x value what is the y value
unsupervised learning example
clustering: grouping objects based on similarity or differences
Supervised learning example:
regression or classification
classification vs regression
regression predicts continuous values and classification predicts discrete labels
what is a decision tree
tree structure, splits based on features and identifications happen at leaf nodes
Learning Road Map
- pre processing
- model selection
- model training
- model evaluation
- model deployment
SVM
find a line to chop the data in half
characteristics of SVM
- robust to noise
- overfitting is handled by maximizing margin
- handles irrelevant attributes better than many techniques
- difficult to handle missing values
what is generalization error
test error - training error. how well the model generalizes well to new data
How to improve a model that has poor generalization error
More Training data or simplifying the model
NN Params
Neurons per layer
number of hidden layers