How to validate model quality Flashcards

1
Q

Train-Test Split

A

Train-Test Split Evaluation The train-test split is a technique for evaluating the performance of a machine learning algorithm.

It can be used for classification or regression problems and can be used for any supervised learning algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What happens in Train test split

A

The procedure involves taking a dataset and dividing it into two subsets.
The first subset is used to fit the model and is referred to as the training dataset.
The second subset is not used to train the model; instead, the input element of the dataset is provided to the model, then predictions are made and compared to the expected values. This second dataset is referred to as the test dataset.
Train Dataset: Used to fit the machine learning model.
Test Dataset: Used to evaluate the fit machine learning model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Objective of train-test split

A

The objective is to estimate the performance of the machine learning model on new data: data not used to train the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When do we use Train test split

A

This is how we expect to use the model in practice. Namely, to fit it on available data with known inputs and outputs, then make predictions on new examples in the future where we do not have the expected output or target values.

The train-test procedure is appropriate when there is a sufficiently large dataset available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to Configure the Train-Test Split

A
  • The procedure has one main configuration parameter, which is the size of the train and test sets. This is most commonly expressed as a percentage between 0 and 1 for either the train or test datasets.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

You must choose a split percentage that meets your project’s objectives with considerations that include:

A
  • Computational cost in training the model.
  • Computational cost in evaluating the model.
  • Training set representativeness.
  • Test set representativeness.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

common split percentages include:

A

Train: 80%, Test: 20%
Train: 67%, Test: 33%
Train: 50%, Test: 50%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Training and Test Data in Python Machine Learning

A

As we work with datasets, a machine learning model works in two stages.
We usually split the data around 20%-80% between testing and training stages.
Under supervised learning, we split a dataset into a training data and test data in Python ML.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly