Supervised Learning Flashcards

Question

Heat map

Answer 1

A type of chart that indicates a variable's correlation in relation to another

Answer 2

The process of transforming a categorical variable into dichotomous indicator variables so that the data is numeric.

Answer 3

Aka as a dummy variable, a dichotomous variable that indicates the presence or absence of a given qualitative variable

Answer 4

division between two mutually exclusive or contradictory groups. In data science, it often refers to a binary classification where there are only two possible categories (Ex: True/False, Yes/No, 0/1).

Answer 5

a transformation designed to transform data to resemble a normal distribution

Answer 6

The process of rescaling variables into the [0,1] range

Answer 7

The process of rescaling a variable to have a mean of zero and a standard deviation of one

Answer 8

means adjusting its values to fit within a specific range or scale. This process is crucial when dealing with data that have different units or magnitudes. It helps ensure that all variables contribute equally to the analysis.

Answer 9

A class of feature-selection methods that evaluate each feature separately and assign it a score that's used to rank the features, with scores above a certain cutoff point being retained or discarded

Answer 10

A class of feature-selection methods that construct sets of features, evaluate each set in terms of their predictive power in a model and compare the set's performance to the performance of other sets

Answer 11

A class of feature-selection methods that select sets of features as an intrinsic part of the fitting method for the particular type of model being used

Answer 12

a complexity reduction technique that tries to reduce a set of variables down to a smaller set of components that represent most of the information in the variables

Answer 13

A vector that doesn't change its direction when the linear transformation is applied to it

Answer 14

a quantity with both magnitude and direction represented as an array of numbers. In DS they often represent features of a dataset. (Example: in 2D space, a vector might look like([x,y]), where (x) and (y) are the coordinates.

Answer 15

The factor by which the eigenvector is scaled

Answer 16

Eigenvectors that have been divided by the square roots of their eigenvalues

Answer 17

A simplified mathematical representation of the data scientist's best guess about the underlying processes that created the data

Answer 18

An element with information that explains a large amount of variance in the outcome of interest

Answer 19

Known as AI, the study of systems that perform tasks that require human intelligence, such as understanding natural language, recognizing objects, or driving a car

Answer 20

Processed data that is ready to be used in models

Answer 21

The vector space of all instances of the data

Answer 22

A machine-learning approach where the computer is presented with a set of features and their corresponding targets, and then asked to learn what the pattern in the dataset is

Answer 23

A machine-learning approach where the learning algorithm is given features without labels, meaning that it needs to discover the pattern in the data

Answer 24

A machine-learning approach where the computer is given a partially complete feature-target set, where many targets are missing from the features in many instances

Answer 25

A machine-learning approach where feedback is given to the learning agent (or algorithm) in a dynamic environment in the form of rewards and punishments

Answer 26

How well a learning agent can apply the concepts that it's learned to new instances that it didn't see during training

Answer 27

A scenario where the model can't fit any data, including training, test, and unseen data

Answer 28

A phenomenon that occurs in machine learning models when a model becomes too complex or fit so well to the training data that it cannot perform well on new data

Answer 29

The process of determining categories for objects and then predicting which category previously unseen objects belong to

Answer 30

Data that is already associated with a target value

Answer 31

A table showing every combination of predicted and actual values

Answer 32

Data that when graphed in two dimensions can be separated into two classes by a straight line

Answer 33

An algorithm that aims to predict the labeled class to which each observation belongs

Answer 34

An algorithm that classifies objects based on a linear combination of the characteristics

Answer 35

A line or surface that separates different predicted classes

Answer 36

an optimization algorithm that involves repeatedly updating the parameters to the hypothesis function and measuring the error until the error is as small as possible.

Answer 37

A dataset with a fairly even distribution of values across each class

Answer 38

A dataset with a skewed distribution of values across each class, thus creating a challenge for predictive modeling

Answer 39

the probability that a negative instance will be incorrectly predicted as positive

Answer 40

the probability that a positive instance will be correctly predicted as positive

Answer 41

A parameter that determines when to convert a predicted probability into a class label

Answer 42

The proportion of positive predictions that are correct

Answer 43

The proportion of instances in the positive class that were correctly predicted as positive

Answer 44

a visualization created by plotting precision against recall while varying the threshold from 0 to 1, which is useful for class-imbalanced data.

Answer 45

a strategy for transforming a multiclass problem into several binary problems by training a single classifier per class

Answer 46

The process of estimating the relationship between one or more observed features and some continuous target variable

Answer 47

Unexplained variability within a target variable or data

Answer 48

on optimization algorithm that tries to minimize the sum of squared distances between each point and the line, and chooses the line that minimizes this sum

Answer 49

A regression model that aims to model a linear relationship between the target variable and the coefficients of the features

Answer 50

the measure of the degree of asymmetry of the distribution

Answer 51

Measure of the sharpness of a distribution's peak

Answer 52

The process of finding the optimal values of the unknown coefficients

Answer 53

Also known as the residual, the information in the target variable that isn't explained by the features

Answer 54

Refers to creating a model based on known data(past observations) to understand the underlying relationship between variables. (For example, using a linear regression model to estimate the relationship between features and a target variable.)

Answer 55

This involves using the estimated model to forecast unknown outcomes or future data points. This means applying the regression model to new da ta to predict the target variable's value.

Supervised Learning Flashcards

(81 cards)