Chapter 4 Flashcards

1
Q

What’s the most common splitting criterion?

A

information gain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

What’s the role of Decision Trees?

A

Create a formula/algorithm that evaluates how well each attribute splits a set of example into segments, with respect to a chosen target variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

To what does disorder correspond to?

A

to how mixed (impure) the segment is with respec to values of attribute of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Formula of Entropy

A

-p1 log(p1) – p2 log (p2) ….

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define Pi

A

probability of value i within the set (relative percentage/share)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When is Pi = 1?

A

when all members of set have attribute i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When is Pi = 0?

A

when no members of the set have the attributte i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the parent set?

A

the original set of examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does an attribute do?

A

It segments a set of instances into several k subsets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are K children sets?

A

The result of splitting on the attribute values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does Information gain measure?

A
  • how much an attirbute improves (decreases) entropy
  • change in entropy due to new info added
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Formula IG(parent)

A

IG(parent) = Entropy(parent) – p(c1) entropy(c1) – p(c2) entropy(c2) ….

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Formula Entropy (HS = square)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Formula Entropy (HS = cricle)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Formula IG = entropy (Write-off)..

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What reduces entropy substantially?

A

splitting parents data set by body shape attribute

  • select attribute that reduces entropy the most
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you find the best attribute to partition the sets?

A

recursively apply attribute selection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Disadvantages of ID3

A
  • tends to prefer splits that result in larg numbers of partitions, small but pure
  • overfitting, less generalization capacity
  • cannot handle numeric values, missing values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

List ANN (artificial nerual networks)

A
  • neurons
  • nucleus
  • dendrite
  • axon
  • synapse
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define neurons

A

cells (processing elements) of a biological or artifical neural network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Define the nucleus

A

the central processing portion of a neuron

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Define the dendrite

A

the part of a biological neuron tha tprovides inputs to the cell

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Define the axon

A

an outgoing connection (i.e., terminal) from a biological neuron

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Define synapse

A

the connection (where the weights are) between processing elements in a neural network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Define Learning

A
  • an establishment of interneuron connections
  • classical conditioning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is ANN?

A

computer technology that attempts to build computers that will operate like human brains
- machine process simultaneous memory storage and work with ambiguous info

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is a single perceptron?

A

early neural network structure that uses no hidden layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the input of ANN

A

consists of the output of the sending unit and the wight between the sending and receiving units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are connection weights of ANN associated with?

A

with each link in a neural network model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What do connection weights of ANN express?

A

the relative strenght of the input data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

By what are connection weights of ANN assesed?

A

neural networks learning algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What does the Propagation (summation) function determine?

A

how the new input is computed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What type of combination is used in the propatagion (summation) function?

A

linear

33
Q

Formula netinput i

A
34
Q

What does the activation function do?

A

computes the internal stimulation (activity level) of the neuron
- neuron may or may not produce an output (fire)

35
Q

What else is the activation function called?

A
  • transformation function
  • transfer function
36
Q

What’s the range of human hearing?

A

20 Hz to 20 kHz

37
Q

Output ANN

A
  • sometimes a threshhold function is used
  • most software packages do not distinguish between activation level and output function
38
Q

How is learning done in ANN?

A

by comparing computer (predicted) outputs to desired (true target values) outputs of historical cases

39
Q

Define learning in ANN

A

a change of weights between units

40
Q

Describe the three tasks of the process of learning in ANN

A
  1. compute temporary outputs
  2. compare outputs with desired targets
  3. adjust the weights and repeat the process
41
Q

What is the Delta rule?

A

a special form of steepest gradient descent approach

42
Q

What is the Delta rule also called?

A
  • Widrow-Howw rule
  • Least Mean Square rule
43
Q

Linear separability:what does a single neuron represent?

A

a hyperplane in instance

44
Q

Linear separability: What can be represented using a perceptron?

A

Three operations
AND
OR
NOT

45
Q

Linear separability: what is needed?

A

multilayer perceptron

46
Q

Into what can any expression from propositional calculus be converted?

A

a multilayer perceptron

47
Q

Multilayer perceptrons: Topologies

A

the type how neurons are organized in a neural network

48
Q

Multilayer perceptrons: How many layers does the network structure have?

A

3
1. Input Layer
2. Hidden Layers
3. Output layer

49
Q

Describe the Input layer of the Multilayer perceptrons

A
  • each input corresponds to a single attribute
  • several types of data cand be sued
  • preprocessing may be needed to convert the data into meaningful inputs
50
Q

Describe the hidden layers of the Multilayer perceptrons

A
  • the middle layer of an artificial neural network
  • has three or more layers
  • each layer increases the training effort exponentially
51
Q

Describe the output layer of the Multilayer perceptrons

A
  • contains solution to a problem
  • the purpose of the network is to compute the output values
52
Q

Flow diagram of the development process of an ANN

A
  1. Collect Data
  2. Separate into training & testing set
  3. Define a network structure
  4. Select a learning algorithm
  5. Set parameters and values, initialize weights
  6. Transform data into network outputs
  7. Start training and determine and revise weights
  8. Stop and test
  9. Implementation: use the network with new cases
53
Q

How can the relationship between the internal activation level and the output be?

A
  • linear
  • nonlinear
54
Q

What are the types of learning?

A
  • supervised
  • unsupervised
  • reinforced
  • direct design methods (hardwired systems)
55
Q

What are the times of learning?

A

incremental training
vs
batch training

56
Q

What are the learning rules in ANN

A
  • Delta rule
  • Gradient descent
  • Backpropagation
  • Hebbian rule
  • Competitive learning
57
Q

To which type of ANN does the delta rule apply?

A

without hidden layers

58
Q

For what are ANN with hidden layers needed?

A

some problems, like training an XOR classifier

59
Q

Define Backpropagation

A
  • the error (similar to data rule) is propagated back

also possible: the calculation of the weight changes for hidden layers

60
Q

List the steps of Backpropagation

A
  1. Initialize weights with random values and set other parameters
  2. Read the input vector and the desired output
  3. Compute the actual output via the calculations, working forward through the layers (forward pass)
  4. Compute the error
  5. Change the weights by working backward from the output layer through the hidden layer (backward-pass?
61
Q

What is the forward pass?

A

computing the actual output via the calculations, working forward through the layers

62
Q

What is the backward pass?

A

changing the weights by working backward from the output layer through the hidden layers

63
Q

Define the gradient descent

A

find combination of all weights w, so that the sum of the squared errors F is minimized

64
Q

Gradient Descent: Porblem

A

high computational complexity

65
Q

Gradient Descent: Solution

A

sleepest gradient descend method
- the negative gradient gives the direction where to move in next iteration

66
Q

Gradient Descent: Premise for usage

A
  • differentiable propagation
  • activation
  • output functions
67
Q

Gradient Descent: Workaround for limitations

A

change:
- initial weights
- starting point of the gradient approach
- type of initialization
- learning parameters

define different learning rates for different layers

insert momentum (inertia) parameter

apply decay parameter

68
Q

How do we change learning parameters as a workaround for limitations in gradient descent?

A
  • increase learning rate
  • decreate learning rate
  • vary learning rates
69
Q

What is A Self-Organizing Map?

A

a smart map that takes compelx information and organizes it neatly

70
Q

How does a Self-Organizing Map organize information neatly?

A

by placing similar things close to each other on the map

71
Q

How does a Self-Organizing Map adjust its map?

A

so that it can recognize and regroup similar patterns in data

72
Q

another name of Self-Organizing Maps

A

Kohonen’s self organizing maps
(SOM)

73
Q

What are Hopfield networks?

A

smart memory systems that can remember and recall patterns

74
Q

How do Hopfield networks work?

A
  • they connect all their “brain cells” together
  • when they learn something, the connections get adjusted
75
Q

What do Hopfield networks do when you give them a partial or noisy pattern?

A

they can fill in the blanks and remember the closest thing they learned

76
Q

What are Hopfield networks used for?

A
  • remembering faces
  • solving certain types of problems
77
Q

Advantages of ANN

A
  • able to deal with highly nonlinear relationships
  • not prone to restricting normality and/or independence assumptions
  • can handle variety of problem types
  • proves better results compared to its statistical counterparts
  • handles both numerical and categorical variables (transformation needed)
78
Q

What are the limitations of ANN

A
  • black-box solutions lacking explainability
  • hard to find optimal values for large number of network parameters
  • optimal design is hard to achieve
  • a large number of variables is hard to handle
  • training may take a long time for large datasets
79
Q

What required longer training for large datasets for ANN?

A

case sampling