Chapter 4 Key Terms Flashcards
algorithm
A step-by-step search in which improvement is made at every step until the best solution is found.
Adaptive resonance theory
An unsupervised learning method created by Stephen Grossberg.
ART is a neural network architecture that is aimed at being brainlike in unsupervised mode.
Apriori algorithm
The most commonly used algorithm to discover association rules by recursively identifying frequent itemsets.
area under the ROC curve
A graphical assessment technique for binary classification models where the true positive rate is plotted on the Y-axis and false positive rate is plotted on the X-axis.
artificial neural network (ANN)
Computer technology that attempts to build computers that operate like a human brain. The machines possess simultaneous memory storage and work with ambiguous information. Sometimes called, simply, a neural network.
See neural computing.
association
A category of data mining algorithm that establishes relationships about items that occur together in a given record.
axon
An outgoing connection (i.e. terminal) from a biological neuron.
backpropagation
The best-known learning algorithm in neural computing where the learning is done by comparing computed outputs to desired outputs of training cases.
bootstrapping
A sampling technique where a fix number of instances from the original data are sampled (with replacement) for training and the rest of the dataset is used for testing.
business analyst
An individual whose job is to analyze business processes and the support they receive (or need) from information technology.
categorical data
Data that represent the labels of multiple classes used to divide a variable into specific groups.
chromosome
A candidate solution for a genetic algorithm.
classification
Supervised induction used to analyze the historical data stored in a database and to automatically generate a model that can predict future behavior.
clustering
Partitioning a database into segments in which the members of a segment share similar qualities.
confidence
In association rules, the conditional probability of finding the RHS of the rule present in a list of transactions where the LHS of the rule exists.
connection weight
The weight associated with each link in a neural network model.
Neural networks learning algorithms assess connection weights.
CRISP-DM
A cross-industry standardization process of conducting data mining projects, which is a sequence of six steps that starts with a good understanding of the business and the need for data mining project (i.e. the application domain) and ends with the deployment of the solution that satisfied the specific business need.
data mining
A process that uses statistical, mathematical, artificial intelligence, and machine-learning techniques to extract and identify useful information and subsequent knowledge from large databases.
decision trees
A graphical presentation of a sequence of interrelated decisions to be made under assumed risk. This technique classifies specific entities into particular classes based upon the features of the entities; a root followed by internal nodes, each node (including root) is labeled with a question, and arcs associated with each node cover all possible responses.
dendrite
The part of a biological neuron that provides inputs to the cell.
discovery-driven data mining
A form of data mining that finds patterns, associations, and relationships among data in order to uncover facts were previously unknown or not even contemplated by an organization.
distance measure
A method used to calculate the closeness between pairs of items in most cluster analysis methods. Popular distance measures include Euclidian distance (the ordinary distance between two points that one would measure with a rule) and Manhattan distance (also called the rectilinear distance, or taxicab distance, between two points).
entrophy
A metric that measures the extent of uncertainty or randomness in a data set. If all the data in a subset belong to just one class, then there is no uncertainty or randomness in that data set, and therefore the entropy is zero.
fuzzy logic
A logically consistent way of reasoning that can cope with uncertain or partial information. Fuzzy login is characteristic or human thinking and expert systems.
genetic algorithm
A software program that learns in an evolutionary manner, similar to the way biological systems evolve.
Gini index
A metric that is used in economics to measure the diversity of the population. The same concept can be used to determine the purity of a specific class as a results of a decision to branch along a particular attribute/variable.