IYSE 6501 Glossary Flashcards

Question 1

Q

Algorithm

Answer

A

Step-by-step procedure designed to carry out a task.

Question 2

Q

Change detection

Answer

A

Identifying when a significant change has taken place in a process.

Question 3

Q

Classification

Answer

A

The separation of data into two or more categories, or (a point’s
classification) the category a data point is put into.

Question 4

Q

Classifier

Answer

A

A boundary that separates the data into two or more categories. Also
(more generally) an algorithm that performs classification.

Question 5

Q

Cluster

Answer

A

A group of points identified as near/similar to each other.

Question 6

Q

Cluster center

Answer

A

In some clustering algorithms (like 𝑘𝑘-means clustering), the central
point (often the centroid) of a cluster of data points.

Question 7

Q

Clustering

Answer

A

Separation of data points into groups (“clusters”) based on
nearness/similarity to each other. A common form of unsupervised
learning.

Question 8

Q

CUSUM

Answer

A

Change detection method that compares observed distribution mean
with a threshold level of change. Short for “cumulative sum”

Question 9

Q

Deep learning

Answer

A

Neural network-type model with many hidden layers

Question 10

Q

Dimension

Answer

A

A feature of the data points (for example, height or credit score). (Note
that there is also a mathematical definition for this word.

Question 11

Q

EM algorithm

Answer

A

Expectation-maximization algorithm.

Question 12

Q

Expectation-maximization
algorithm (EM algorithm)

Answer

A

General description of an algorithm with two steps (often iterated),
one that finds the function for the expected likelihood of getting the
response given current parameters, and one that finds new parameter
values to maximize that probability.

Question 13

Q

Heuristic Algorithm

Answer

A

Algorithm that is not guaranteed to find the absolute best (optimal)
solution

Question 14

Q

𝑘-means algorithm

Answer

A

Clustering algorithm that defines 𝑘𝑘 clusters of data points, each
corresponding to one of 𝑘𝑘 cluster centers selected by the algorithm.

Question 15

Q

k𝑘-Nearest-Neighbor (KNN)

Answer

A

Classification algorithm that defines a data point’s category as a
function of the nearest 𝑘𝑘 data points to it.

Question 16

Q

Kernel

Answer

A

a type of function that computes the similarity between two inputs;
thanks to what’s (really!) sometimes known as the “kernel trick”,
nonlinear classifiers can be found almost as easily as linear ones.

Question 17

Q

Learning

Answer

A

Finding/discovering patterns (or rules) in data, often that can be
applied to new data.

Question 18

Q

Machine

Answer

A

Apparatus that can do something; in “machine learning”, it often refers to both an algorithm and the computer it’s run on. (Fun fact: before
computers were developed, the term “computers” referred to people
who did calculations quickly in their heads or on paper!)

Question 19

Q

Margin

Answer

A

For a single point, the distance between the point and the classification
boundary; for a set of points, the minimum distance between a point
in the set and the classification boundary. Also called the separation

Question 20

Q

Machine learning

Answer

A

Use of computer algorithms to learn and discover patterns or structure
in data, without being programmed specifically for them

Question 21

Q

Misclassified

Answer

A

Put into the wrong category by a classifier

Question 22

Q

Neural network

Answer

A

A machine learning model that itself is modeled after the workings of
neurons in the brain.

Question 23

Q

Supervised learning

Answer

A

Machine learning where the “correct” answer is known for each data
point in the training set.

Question 24

Q

Support vector

Answer

A

In SVM models, the closest point to the classifier, among those in a
category. (Note that there is a more-technical mathematical definition
too.)

Question 25

Q

Support vector machine (SVM)

Answer

A

Classification algorithm that uses a boundary to separate the data into
two or more categories (“classes”).

Question 26

Q

Unsupervised learning

Answer

A

Graphical representation of splitting a plane with two or more special
points into regions with one special point each, where each region’s
points are closer to the region’s special point than to any other special
point.

Question 27

Q

Accuracy

Answer

A

Fraction of data points correctly classified by a model; equal to
TP +TN / TP +FP+TN+FN

Question 28

Q

Confusion matrix

Answer

A

Visualization of classification model performance

Question 29

Q

Diagnostic odds ratio

Answer

A

Ratio of the odds that a data point in a certain category is correctly
classified by a model, to the odds that a data point not in that category
is incorrectly classified by the model; equal to (TP / FN) / (FP / TN) = (TN X TP) / (FN X FP)

Question 30

Q

Fall out

Answer

A

fraction of data points not in a certain category that are incorrectly classified by a model; equal to fp /(TN + FP) also called false positive rate

Question 31

Q

False negative (FN)

Answer

A

Data point that a model incorrectly classifies as not being in a certain
category. (“Negative” means the model classified it as not being in the
category, and “False” means the model’s classification is incorrect.)
Sometimes abbreviated as “FN”.

Question 32

Q

False negative rate

Answer

A

Fraction of data points in a certain category that are incorrectly
classified by a model; equal to FN/TP+FN. Also called miss rate

Question 33

Q

False positive (FP)

Answer

A

Data point that a model incorrectly classifies as being in a certain category. (“Positive” means the model classified it as being in the category, and “False” means the model’s classification is incorrect.) Sometimes abbreviated as “FP

Question 34

Q

False positive rate

Answer

A

Fraction of data points not in a certain category that are incorrectly classified by a model; equal to FP/TN+FP . Also called fall out.

Question 35

Q

False omission rate

Answer

A

Fraction of data points the model classifies as not in a certain category, that are really in the category; equal to NF/(TN+FN)

Question 36

Q

Hit rate

Answer

A

Fraction of data points in a certain category that are correctly classified by a model; equal to TP/(TP+FN) sensitivity, and recall.

Question 37

Q

Miss rate

Answer

A

Fraction of data points in a certain category that are incorrectly classified by a model; equal to FN/(TP+FN) Also called false negative rate

Question 38

Q

Negative likelihood ratio

Answer

A

Ratio of the fraction of data points in a certain category that are misclassified as not in the cateogry, to the fraction of data points not in the category that are correctly classified as not being in the category; equal to (1-sensitivity)/specificity = (FN/(FN+TP))/(TN/(TN+FP))

Question 39

Q

Negative predictive value

Answer

A

Fraction of data points classified as not in a certain category that are really not in that category; equal to TP / (TP+FP)

Question 40

Q

Positive likelihood ratio

Answer

A

Ratio of the fraction of data points in a certain category that are correctly classified as being in that category, to the fraction of data points not in the category that are incorrectly classified as being in the category; equal to sensitivity/(1-specificity) = (TP / (TP+FN)) / (FP/(FP+TN)

Question 41

Q

Positive predictive value

Answer

A

Fraction of data points classified as being in a certain category that are really in that category; equal to TP / (TP+FP) Precision . Also called precision

Question 42

Q

Precision

Answer

A

In analytics, the fraction of data points classified as being in a certain category that are really in that category; equal to TP / (TP+FP) positive predictive value.

Question 43

Q

Recall

Answer

A

Fraction of data points in a certain category that are correctly classified by a model; equal to TP / (TP + FN) positive rate.

Question 44

Q

Sensitivity

Answer

A

Fraction of data points in a certain category that are correctly classified by a model; equal to TP/(TP+FN) and recall

Question 45

Q

Specificity

Answer

A

Fraction of data points not in a certain category that are correctly classified by a model; equal to TN/(TN+FP) rate. also called the true negative rate

Question 46

Q

True negative (TN)

Answer

A

Data point that a model correctly classifies as not being in a certain category. (“Negative” means the model classified it as not being in the category, and “True” means the model’s classification is correct.) Sometimes abbreviated as “TN”.

Question 47

Q

True negative rate

Answer

A

Fraction of data points not in a certain category that are correctly classified by a model; equal to TN / (TN+FP) ; also called specificity.

Question 48

Q

True positive (TP)

Answer

A

Data point that a model correctly classifies as being in a certain category. (“Positive” means the model classified it as being in the category, and “True” means the model’s classification is correct.) Sometimes abbreviated as “TP”.

Question 49

Q

True positive rate

Answer

A

Fraction of data points in a certain category that are correctly classified by a model; equal to tp/ (TP+FN) ; also called sensitivity, hit rate, and recall.

Question 50

Q