Basic Terminology Flashcards
Algorithm
A specific procedure that implements a data mining technique such as linear regression, decision tree, and association rules
Attribute
A descriptor or measurement that characterizes an object, person, transaction, or record. i.e. length, annual income, and color.
Data Dimension
The number of predictors that serves as inputs to a model.
Holdout Data
It is a portion of the dataset that is kept aside for the model validation and testing. It is employed to tune the hyperparameters and assess the model performance.
Input space
It refers to the collection and position of all observation in the dataset. Similar terms: Feature Space, and Domain of Dataset
Labeled Observation
An observation that has both predictors and the corresponding target response
Observation
It refers to the unit on which the measurements or observation; Refers to a set or a vector of predictors or attributes which collectively describe an object, person, entity, unit
Overfitting
A model when it fits the training data very closely and picks up the predictive signals and the noise from the data. It will not generalize well for new data.
Predictor
A variable used as an input to a predictive model. Similar Terms: Feature, Input Variable, Independent Variable
Profile
A select set of measurements on an object, person, entity, unit, or transaction (i.e. height, weight, and age of a person)
Response Variable
It is an attribute of an object, person, entity, unit or transaction that needs to be predicted or estimated by the model. It is the label of observation in supervised learning. Similar terms: Target, Output Variable, Outcome Variable and Dependent Variable
Score
Predict or classify the response for observation not present in the model training and validation datasets
Success Class
The class of interest for a categorical response variable. Usually, it is the class that demands higher predictive accuracy than other classes. For example, if accept, reject and waitlist are possible classes for the response variable, “ accept” could be the _____ _____.
Supervised Learning
It refers to the training process in which a machine-learning algorithm learns from a set of labeled observations
Test Data
A portion of the dataset that is kept aside to evaluate the performance of a model when it is in the final shape.
Training Data
A portion of the dataset that is kept aside to build a machine learning model that captures the relationship between predictors and the target variable.
Validation Data
A portion of the dataset that is kept aside for model in-training performance evaluation, hyperparameter tuning, model selection and defining a stopping criterion.
Unsupervised Learning
It refers to the training process in which a machine-learning algorithm operates on unlabeled observations to discover underlying patterns, associations and clusters.