Module 2 Flashcards
In __________, labeled training data refers to a dataset that includes both the input data and the corresponding correct output.
Supervised Learning
This refers to data with both input data and a corresponding correct output.
Labeled Training Data
______ is used to train a machine learning model to make predictions or decisions without being explicitly programmed.
Labeled Data
The primary objective of ________ is to make a function or mapping the input variable with the output variable.
Supervised Learning
What are the two categories under Supervised Learning?
Regression (Prediction) and Classification (Description)
This category of Supervised Learning refers to algorithms that address classification problems where the output variable is categorical.
Classification
This category of Supervised Learning predicts one of the possible class labels.
Classification
What are some types of Classification?
- Binary Classification - classification of two classes.
- Multiple Classification - classification of three or more classes.
What are the examples of Classification algorithms.
- Random Forest Algorithm
- Decision Tree Algorithm
- Logistic Regression Algorithm
- Support Vector Machine Algorithm
This category of Supervised Learning handle regression problems where input and output variables have a linear relationship.
Regression
This category of Supervised Learning predicts consecutive numbers (real numbers).
Regression
What are some examples of Regression algorithms?
- Simple Linear Regression Algorithm
- Multivariate Regression Algorithm
- Decision Tree Algorithm
- Lasso Regression
True or False.
The supervised ML has three phases: the usual training and validation, data prediction, and deployment.
False
(TWO phases only. The usual training and validation, followed by prediction.)
True or False.
Model complexity is loosely tied to the variation of inputs contained within the training dataset.
False.
(it is INTIMATELY tied to the variation of inputs)
True or False.
Regarding model complexity, the larger the variety of data points the data set contains, the more complex a model can be used without overfitting.
True.
True or False.
Collecting more data points will yield more variety, so that larger datasets allow for building more complex models.
True.
True of False.
Duplicating similar data points or collecting very similar data is usually helpful.
False.
True or False.
In supervised learning, it is important to build a model on the training data and then be able to make accurate predictions on previously observed data.
False.
(make accurate predictions on NEW, UNSEEN data that has the SAME CHARACTERISTICS as the training set that we used.)
If a model is able to make accurate predictions on unseen data, we say it is able to _________ from the training set to the test set.
Generalize
This occurs when a model learns the training data too well, including its noise and outliers.
Overfitting
_______ occurs when you fit a model too closely to the particularities of the training set and obtain a model that works well on the training set but is not able to generalize to new data.
Overfitting
True or False.
An overfitted model performs exceptionally well on training data but poorly on new, unseen data.
True
Choosing a model that is too simple is called “______”.
Underfitting
This occurs when your model is too simple then you might not be able to capture all the aspects of and variability in the data, and your model will do badly even on the training set.
Underfitting
True or False.
An underfitted model performs poorly on the training data but excels in new, unseen data.
False.
(underfitted models perform poorly on both training and new data)