Week 9: Evaluation & Data Analysis Flashcards
What is training data used for?
to train the algorithm
3 steps of train, evaulate, test
- train the model using training set
- evaluate/ tune model using validation set
- test model performance on unseen test set
During training what 2 data sets are available?
- training
- validation
How is a data set split into test and training sets?
split more or less randomly, making sure to capture important classes up front
What percentage would training and testing sets be split into
- 80% for training
- 20% for testing
N-fold cross validation
- Randomise the dataset
- Create N equal size partitions
- Choose N for test set
- N-1 partitions for training
What bias does cross-validation hold?
Cross-validation is almost unbias
What is a confusion matrix used for?
It is used to describe
- used to describe the performance of a classification model
- on a set of test data
- for which the true values are known
True positives (TP) means
predicted yes = actual yes
True negatives (TN) means
predicted no = actual no
False positives (FP) means…
- predicted yes but actual no
- type 1 error
False negative (FN)
- predicted no but actual yes
- type 2 error
How is accuracy measured in a confusion matrix?
( True Positive + True Negative ) / total
Name 3 regression evaluation metrics
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
Mean absolute error describes…
the mean of the absolute value of the errors