IYSE 6501 Glossary Flashcards
Algorithm
Step-by-step procedure designed to carry out a task.
Change detection
Identifying when a significant change has taken place in a process.
Classification
The separation of data into two or more categories, or (a point’s
classification) the category a data point is put into.
Classifier
A boundary that separates the data into two or more categories. Also
(more generally) an algorithm that performs classification.
Cluster
A group of points identified as near/similar to each other.
Cluster center
In some clustering algorithms (like 𝑘𝑘-means clustering), the central
point (often the centroid) of a cluster of data points.
Clustering
Separation of data points into groups (“clusters”) based on
nearness/similarity to each other. A common form of unsupervised
learning.
CUSUM
Change detection method that compares observed distribution mean
with a threshold level of change. Short for “cumulative sum”
Deep learning
Neural network-type model with many hidden layers
Dimension
A feature of the data points (for example, height or credit score). (Note
that there is also a mathematical definition for this word.
EM algorithm
Expectation-maximization algorithm.
Expectation-maximization
algorithm (EM algorithm)
General description of an algorithm with two steps (often iterated),
one that finds the function for the expected likelihood of getting the
response given current parameters, and one that finds new parameter
values to maximize that probability.
Heuristic Algorithm
Algorithm that is not guaranteed to find the absolute best (optimal)
solution
𝑘-means algorithm
Clustering algorithm that defines 𝑘𝑘 clusters of data points, each
corresponding to one of 𝑘𝑘 cluster centers selected by the algorithm.
k𝑘-Nearest-Neighbor (KNN)
Classification algorithm that defines a data point’s category as a
function of the nearest 𝑘𝑘 data points to it.
Kernel
a type of function that computes the similarity between two inputs;
thanks to what’s (really!) sometimes known as the “kernel trick”,
nonlinear classifiers can be found almost as easily as linear ones.
Learning
Finding/discovering patterns (or rules) in data, often that can be
applied to new data.
Machine
Apparatus that can do something; in “machine learning”, it often refers to both an algorithm and the computer it’s run on. (Fun fact: before
computers were developed, the term “computers” referred to people
who did calculations quickly in their heads or on paper!)
Margin
For a single point, the distance between the point and the classification
boundary; for a set of points, the minimum distance between a point
in the set and the classification boundary. Also called the separation
Machine learning
Use of computer algorithms to learn and discover patterns or structure
in data, without being programmed specifically for them
Misclassified
Put into the wrong category by a classifier
Neural network
A machine learning model that itself is modeled after the workings of
neurons in the brain.
Supervised learning
Machine learning where the “correct” answer is known for each data
point in the training set.
Support vector
In SVM models, the closest point to the classifier, among those in a
category. (Note that there is a more-technical mathematical definition
too.)
Support vector machine (SVM)
Classification algorithm that uses a boundary to separate the data into
two or more categories (“classes”).
Unsupervised learning
Graphical representation of splitting a plane with two or more special
points into regions with one special point each, where each region’s
points are closer to the region’s special point than to any other special
point.
Accuracy
Fraction of data points correctly classified by a model; equal to
TP +TN / TP +FP+TN+FN
Confusion matrix
Visualization of classification model performance
Diagnostic odds ratio
Ratio of the odds that a data point in a certain category is correctly
classified by a model, to the odds that a data point not in that category
is incorrectly classified by the model; equal to (TP / FN) / (FP / TN) = (TN X TP) / (FN X FP)
Fall out
fraction of data points not in a certain category that are incorrectly classified by a model; equal to fp /(TN + FP) also called false positive rate
False negative (FN)
Data point that a model incorrectly classifies as not being in a certain
category. (“Negative” means the model classified it as not being in the
category, and “False” means the model’s classification is incorrect.)
Sometimes abbreviated as “FN”.
False negative rate
Fraction of data points in a certain category that are incorrectly
classified by a model; equal to FN/TP+FN. Also called miss rate
False positive (FP)
Data point that a model incorrectly classifies as being in a certain category. (“Positive” means the model classified it as being in the category, and “False” means the model’s classification is incorrect.) Sometimes abbreviated as “FP
False positive rate
Fraction of data points not in a certain category that are incorrectly classified by a model; equal to FP/TN+FP . Also called fall out.
False omission rate
Fraction of data points the model classifies as not in a certain category, that are really in the category; equal to NF/(TN+FN)
Hit rate
Fraction of data points in a certain category that are correctly classified by a model; equal to TP/(TP+FN) sensitivity, and recall.
Miss rate
Fraction of data points in a certain category that are incorrectly classified by a model; equal to FN/(TP+FN) Also called false negative rate
Negative likelihood ratio
Ratio of the fraction of data points in a certain category that are misclassified as not in the cateogry, to the fraction of data points not in the category that are correctly classified as not being in the category; equal to (1-sensitivity)/specificity = (FN/(FN+TP))/(TN/(TN+FP))
Negative predictive value
Fraction of data points classified as not in a certain category that are really not in that category; equal to TP / (TP+FP)
Positive likelihood ratio
Ratio of the fraction of data points in a certain category that are correctly classified as being in that category, to the fraction of data points not in the category that are incorrectly classified as being in the category; equal to sensitivity/(1-specificity) = (TP / (TP+FN)) / (FP/(FP+TN)
Positive predictive value
Fraction of data points classified as being in a certain category that are really in that category; equal to TP / (TP+FP) Precision . Also called precision
Precision
In analytics, the fraction of data points classified as being in a certain category that are really in that category; equal to TP / (TP+FP) positive predictive value.
Recall
Fraction of data points in a certain category that are correctly classified by a model; equal to TP / (TP + FN) positive rate.
Sensitivity
Fraction of data points in a certain category that are correctly classified by a model; equal to TP/(TP+FN) and recall
Specificity
Fraction of data points not in a certain category that are correctly classified by a model; equal to TN/(TN+FP) rate. also called the true negative rate
True negative (TN)
Data point that a model correctly classifies as not being in a certain category. (“Negative” means the model classified it as not being in the category, and “True” means the model’s classification is correct.) Sometimes abbreviated as “TN”.
True negative rate
Fraction of data points not in a certain category that are correctly classified by a model; equal to TN / (TN+FP) ; also called specificity.
True positive (TP)
Data point that a model correctly classifies as being in a certain category. (“Positive” means the model classified it as being in the category, and “True” means the model’s classification is correct.) Sometimes abbreviated as “TP”.
True positive rate
Fraction of data points in a certain category that are correctly classified by a model; equal to tp/ (TP+FN) ; also called sensitivity, hit rate, and recall.