Exam Flashcards
What is a classification problem
A problem that requires machine learning algorithms that learn how to assign a class label to examples from the problem domain
What is a regression problem
A problem that learns to predict continuous variables
What algorithms are used for Regression Problems?
- Linear Regression
- Support Vector Regression
- Regression Tree
Give an example of a Classification problem
Getting a machine to classify different images such as the difference between apple[1,0,0], banana[0,1,0] and cherry[0,0,1]
What is Underfitting?
When a model cannot capture underlying trend of the data
Why does Underfitting occur?
Algorithm does not fit/ Not enough data
What happens with the Bias& Variance in Underfitting
High bias and low variance
What is Bias?
Assumptions made by a model to make a function easier to learn
What is Variance?
Training data obtains a low error, and then changing training data obtains a high error
How to prevent Underfitting
Increase model complexity
Increase number of features (feature engineering)
remove noise
Increase epochs
What is overfitting?
Trained with a lot of data, the model starts to learn from the noise and inaccurate data entries. The model has too much freedom and builds an unrealistic model
What is overfitting in terms of variance and bias
High variance and low bias
How to reduce overfitting
Increase training data
reduce model complexity
early stopping
L1&L2 regularization
Dropouts if neural network
What is regularisation
the technique of calibrating machine learning models to minimize the loss and prevent over or underfitting
What noise mean?
The data points in a dataset that don’t really represent the true properties of your data
What does Bias mean in terms of regularisation?
the difference between the actual and predicted values Less consideration to data pattern = oversimplified and underfit models
What does Variance mean in terms of regularisation
Measure of flexibility in the model. Decides how sensitive the model is to change based on the patterns in the input data
What happens to the training and testing error when the bias is high
They will also be high
What happens to the training and testing error when the variance is high
They will be low
Name the two main types of regularization techniques
Ridge and Lasso Regulation
What is Ridge regularisation
Modifies over or underfitted models by adding the penalty equivalent to sum of the squares of the magnitude of coefficients
what is Lasso Reggression
Modifies the over fitted/underfitted models by adding a penalty = to the sum of the absolute values of coefficients
What is Dropout In regularisation
Randomly selected neurons are ignored during training. Dropped out randomly. therefore their contribution is temporally removed
What happens as a neural network learns
Weights settle into their context within the network. Weights are tuned for specific features, providing some specialization. Neighboring neurons come to rely on this specializations which can result in a fragile model too specialized for training the data.
How does dropout help with overfitting
- Neurons cannot rely of one input as it may dropout at random - this reduces Bias due to over-relying on one input
- neurons will not learn redundant details of inputs
The concept of concept attainment requires the following 5 categories
- identify task
- nature of examples used
- validation procedure
- consequences of categorizations
- nature of imposed restriction
what is an decision tree
A supervised learning algorithm (regression and classification) Tree structure with roots, nodes and branches like a flowchart
Advantages of decision trees
- easy to interpret
-no data preparation required
-more flexible
Disadvantages of decision trees
-prone to overfitting
-high variance
-more costly