Basic Terminology Flashcards

1
Q

Algorithm

A

A specific procedure that implements a data mining technique such as linear regression, decision tree, and association rules

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Attribute

A

A descriptor or measurement that characterizes an object, person, transaction, or record. i.e. length, annual income, and color.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Dimension

A

The number of predictors that serves as inputs to a model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Holdout Data

A

It is a portion of the dataset that is kept aside for the model validation and testing. It is employed to tune the hyperparameters and assess the model performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Input space

A

It refers to the collection and position of all observation in the dataset. Similar terms: Feature Space, and Domain of Dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Labeled Observation

A

An observation that has both predictors and the corresponding target response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Observation

A

It refers to the unit on which the measurements or observation; Refers to a set or a vector of predictors or attributes which collectively describe an object, person, entity, unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Overfitting

A

A model when it fits the training data very closely and picks up the predictive signals and the noise from the data. It will not generalize well for new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Predictor

A

A variable used as an input to a predictive model. Similar Terms: Feature, Input Variable, Independent Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Profile

A

A select set of measurements on an object, person, entity, unit, or transaction (i.e. height, weight, and age of a person)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Response Variable

A

It is an attribute of an object, person, entity, unit or transaction that needs to be predicted or estimated by the model. It is the label of observation in supervised learning. Similar terms: Target, Output Variable, Outcome Variable and Dependent Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Score

A

Predict or classify the response for observation not present in the model training and validation datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Success Class

A

The class of interest for a categorical response variable. Usually, it is the class that demands higher predictive accuracy than other classes. For example, if accept, reject and waitlist are possible classes for the response variable, “ accept” could be the _____ _____.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Supervised Learning

A

It refers to the training process in which a machine-learning algorithm learns from a set of labeled observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Test Data

A

A portion of the dataset that is kept aside to evaluate the performance of a model when it is in the final shape.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Training Data

A

A portion of the dataset that is kept aside to build a machine learning model that captures the relationship between predictors and the target variable.

17
Q

Validation Data

A

A portion of the dataset that is kept aside for model in-training performance evaluation, hyperparameter tuning, model selection and defining a stopping criterion.

18
Q

Unsupervised Learning

A

It refers to the training process in which a machine-learning algorithm operates on unlabeled observations to discover underlying patterns, associations and clusters.