Chapter 4 Flashcards
What’s the most common splitting criterion?
information gain
What’s the role of Decision Trees?
Create a formula/algorithm that evaluates how well each attribute splits a set of example into segments, with respect to a chosen target variable
To what does disorder correspond to?
to how mixed (impure) the segment is with respec to values of attribute of interest
Formula of Entropy
-p1 log(p1) – p2 log (p2) ….
Define Pi
probability of value i within the set (relative percentage/share)
When is Pi = 1?
when all members of set have attribute i
When is Pi = 0?
when no members of the set have the attributte i
What is the parent set?
the original set of examples
What does an attribute do?
It segments a set of instances into several k subsets.
What are K children sets?
The result of splitting on the attribute values.
How does Information gain measure?
- how much an attirbute improves (decreases) entropy
- change in entropy due to new info added
Formula IG(parent)
IG(parent) = Entropy(parent) – p(c1) entropy(c1) – p(c2) entropy(c2) ….
Formula Entropy (HS = square)
Formula Entropy (HS = cricle)
Formula IG = entropy (Write-off)..
What reduces entropy substantially?
splitting parents data set by body shape attribute
- select attribute that reduces entropy the most
How do you find the best attribute to partition the sets?
recursively apply attribute selection
Disadvantages of ID3
- tends to prefer splits that result in larg numbers of partitions, small but pure
- overfitting, less generalization capacity
- cannot handle numeric values, missing values
List ANN (artificial nerual networks)
- neurons
- nucleus
- dendrite
- axon
- synapse
Define neurons
cells (processing elements) of a biological or artifical neural network
Define the nucleus
the central processing portion of a neuron
Define the dendrite
the part of a biological neuron tha tprovides inputs to the cell
Define the axon
an outgoing connection (i.e., terminal) from a biological neuron
Define synapse
the connection (where the weights are) between processing elements in a neural network
Define Learning
- an establishment of interneuron connections
- classical conditioning
What is ANN?
computer technology that attempts to build computers that will operate like human brains
- machine process simultaneous memory storage and work with ambiguous info
What is a single perceptron?
early neural network structure that uses no hidden layer
What is the input of ANN
consists of the output of the sending unit and the wight between the sending and receiving units
What are connection weights of ANN associated with?
with each link in a neural network model
What do connection weights of ANN express?
the relative strenght of the input data
By what are connection weights of ANN assesed?
neural networks learning algorithms
What does the Propagation (summation) function determine?
how the new input is computed
What type of combination is used in the propatagion (summation) function?
linear
Formula netinput i
What does the activation function do?
computes the internal stimulation (activity level) of the neuron
- neuron may or may not produce an output (fire)
What else is the activation function called?
- transformation function
- transfer function
What’s the range of human hearing?
20 Hz to 20 kHz
Output ANN
- sometimes a threshhold function is used
- most software packages do not distinguish between activation level and output function
How is learning done in ANN?
by comparing computer (predicted) outputs to desired (true target values) outputs of historical cases
Define learning in ANN
a change of weights between units
Describe the three tasks of the process of learning in ANN
- compute temporary outputs
- compare outputs with desired targets
- adjust the weights and repeat the process
What is the Delta rule?
a special form of steepest gradient descent approach
What is the Delta rule also called?
- Widrow-Howw rule
- Least Mean Square rule
Linear separability:what does a single neuron represent?
a hyperplane in instance
Linear separability: What can be represented using a perceptron?
Three operations
AND
OR
NOT
Linear separability: what is needed?
multilayer perceptron
Into what can any expression from propositional calculus be converted?
a multilayer perceptron
Multilayer perceptrons: Topologies
the type how neurons are organized in a neural network
Multilayer perceptrons: How many layers does the network structure have?
3
1. Input Layer
2. Hidden Layers
3. Output layer
Describe the Input layer of the Multilayer perceptrons
- each input corresponds to a single attribute
- several types of data cand be sued
- preprocessing may be needed to convert the data into meaningful inputs
Describe the hidden layers of the Multilayer perceptrons
- the middle layer of an artificial neural network
- has three or more layers
- each layer increases the training effort exponentially
Describe the output layer of the Multilayer perceptrons
- contains solution to a problem
- the purpose of the network is to compute the output values
Flow diagram of the development process of an ANN
- Collect Data
- Separate into training & testing set
- Define a network structure
- Select a learning algorithm
- Set parameters and values, initialize weights
- Transform data into network outputs
- Start training and determine and revise weights
- Stop and test
- Implementation: use the network with new cases
How can the relationship between the internal activation level and the output be?
- linear
- nonlinear
What are the types of learning?
- supervised
- unsupervised
- reinforced
- direct design methods (hardwired systems)
What are the times of learning?
incremental training
vs
batch training
What are the learning rules in ANN
- Delta rule
- Gradient descent
- Backpropagation
- Hebbian rule
- Competitive learning
To which type of ANN does the delta rule apply?
without hidden layers
For what are ANN with hidden layers needed?
some problems, like training an XOR classifier
Define Backpropagation
- the error (similar to data rule) is propagated back
also possible: the calculation of the weight changes for hidden layers
List the steps of Backpropagation
- Initialize weights with random values and set other parameters
- Read the input vector and the desired output
- Compute the actual output via the calculations, working forward through the layers (forward pass)
- Compute the error
- Change the weights by working backward from the output layer through the hidden layer (backward-pass?
What is the forward pass?
computing the actual output via the calculations, working forward through the layers
What is the backward pass?
changing the weights by working backward from the output layer through the hidden layers
Define the gradient descent
find combination of all weights w, so that the sum of the squared errors F is minimized
Gradient Descent: Porblem
high computational complexity
Gradient Descent: Solution
sleepest gradient descend method
- the negative gradient gives the direction where to move in next iteration
Gradient Descent: Premise for usage
- differentiable propagation
- activation
- output functions
Gradient Descent: Workaround for limitations
change:
- initial weights
- starting point of the gradient approach
- type of initialization
- learning parameters
define different learning rates for different layers
insert momentum (inertia) parameter
apply decay parameter
How do we change learning parameters as a workaround for limitations in gradient descent?
- increase learning rate
- decreate learning rate
- vary learning rates
What is A Self-Organizing Map?
a smart map that takes compelx information and organizes it neatly
How does a Self-Organizing Map organize information neatly?
by placing similar things close to each other on the map
How does a Self-Organizing Map adjust its map?
so that it can recognize and regroup similar patterns in data
another name of Self-Organizing Maps
Kohonen’s self organizing maps
(SOM)
What are Hopfield networks?
smart memory systems that can remember and recall patterns
How do Hopfield networks work?
- they connect all their “brain cells” together
- when they learn something, the connections get adjusted
What do Hopfield networks do when you give them a partial or noisy pattern?
they can fill in the blanks and remember the closest thing they learned
What are Hopfield networks used for?
- remembering faces
- solving certain types of problems
Advantages of ANN
- able to deal with highly nonlinear relationships
- not prone to restricting normality and/or independence assumptions
- can handle variety of problem types
- proves better results compared to its statistical counterparts
- handles both numerical and categorical variables (transformation needed)
What are the limitations of ANN
- black-box solutions lacking explainability
- hard to find optimal values for large number of network parameters
- optimal design is hard to achieve
- a large number of variables is hard to handle
- training may take a long time for large datasets
What required longer training for large datasets for ANN?
case sampling