Basic Terminology Flashcards

Question 1

Q

Algorithm

Answer

A

A specific procedure that implements a data mining technique such as linear regression, decision tree, and association rules

Question 2

Q

Attribute

Answer

A

A descriptor or measurement that characterizes an object, person, transaction, or record. i.e. length, annual income, and color.

Question 3

Q

Data Dimension

Answer

A

The number of predictors that serves as inputs to a model.

Question 4

Q

Holdout Data

Answer

A

It is a portion of the dataset that is kept aside for the model validation and testing. It is employed to tune the hyperparameters and assess the model performance.

Question 5

Q

Input space

Answer

A

It refers to the collection and position of all observation in the dataset. Similar terms: Feature Space, and Domain of Dataset

Question 6

Q

Labeled Observation

Answer

A

An observation that has both predictors and the corresponding target response

Question 7

Q

Observation

Answer

A

It refers to the unit on which the measurements or observation; Refers to a set or a vector of predictors or attributes which collectively describe an object, person, entity, unit

Question 8

Q

Overfitting

Answer

A

A model when it fits the training data very closely and picks up the predictive signals and the noise from the data. It will not generalize well for new data.

Question 9

Q

Predictor

Answer

A

A variable used as an input to a predictive model. Similar Terms: Feature, Input Variable, Independent Variable

Question 10

Q

Profile

Answer

A

A select set of measurements on an object, person, entity, unit, or transaction (i.e. height, weight, and age of a person)

Question 11

Q

Response Variable

Answer

A

It is an attribute of an object, person, entity, unit or transaction that needs to be predicted or estimated by the model. It is the label of observation in supervised learning. Similar terms: Target, Output Variable, Outcome Variable and Dependent Variable

Question 12

Q

Score

Answer

A

Predict or classify the response for observation not present in the model training and validation datasets

Question 13

Q

Success Class

Answer

A

The class of interest for a categorical response variable. Usually, it is the class that demands higher predictive accuracy than other classes. For example, if accept, reject and waitlist are possible classes for the response variable, “ accept” could be the _____ _____.

Question 14

Q

Supervised Learning

Answer

A

It refers to the training process in which a machine-learning algorithm learns from a set of labeled observations

Question 15

Q

Test Data

Answer

A

A portion of the dataset that is kept aside to evaluate the performance of a model when it is in the final shape.

Question 16

Q

Training Data

Answer

A

A portion of the dataset that is kept aside to build a machine learning model that captures the relationship between predictors and the target variable.

Question 17

Q

Validation Data

Answer

A

A portion of the dataset that is kept aside for model in-training performance evaluation, hyperparameter tuning, model selection and defining a stopping criterion.

Question 18

Q

Unsupervised Learning

Answer

A

It refers to the training process in which a machine-learning algorithm operates on unlabeled observations to discover underlying patterns, associations and clusters.