Exam Flashcards

1
Q

Whats the difference between DFS and BFS?

A

BFS = guaranteed shortest path from start to goal
DFS = may get lost in search and depth-bound may be imposed
BFS = bad branching factor
DFS = may be long and not optimal
DFS = more efficient for big search spaces (high branching factor)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DFS Alg

A

def dfs (in Start, out State)
open = [Start];
closed = [];
State = failure;
while (open <> []) AND (State <> success)
begin
remove the leftmost state from open, call it X;
if X is the goal, then
State = success
else begin
generate children of X;
put X on closed
eliminate the children of X on open or closed
put remaining children on left end of open
end else
endwhile
return State;
enddef

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

BFS alg

A

def bfs (in Start, out State)
open = [Start];
closed = [];
State = failure;
while (open <> []) AND (State <> success)
begin
remove the leftmost state from open, call it X;
if X is the goal, then
State = success
else begin
generate children of X;
put X on closed
eliminate the children of X on open or closed
put remaining children on right end of open
end else
endwhile
return State;
enddef

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does Best FS work?

A

Min cost nodes are expanded 1st.
shortest path not guaranteed.
Try to minimize the cost of finding a solution.
Combination of DFS and BFS with heuristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Whats the difference between Hill climbing and BFS?

A

BFS = global states, Hill climbing = local states

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the HC alg generate?

A

Partial tree/graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the problem with Greedy HC?

A

can get stuck in local optima

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the 4 steps of local search?

A

Generate initial solution
Local search
Perturbation
Acceptance cirteria

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 2 ways to get an initial solution in ILS?

A

random solution or one returned by greedy construction heuristc
(LS is already available for most solutions)
Perturbation: random move in a neighborhood of higher irder than current can be very effective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

ILS Alg:

A

ILS(){
generate initial solution
perform local search
while(stopping cond not met){
perturb;
local search
check acceptance cirterion
}
}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

ILS Alg 2 (Chapter 2A)

A

generate init solution
apply LS Alg
Obtain local optimum (local search)
perturb local optimum to obtain new solution
apply LS Alg to new solution
If new solution is better than current solution update current

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What happens to the temperature in SA?

A

Temp T starts high and is lowered with each iteration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What happens with a new candidate solution at each iteration?

A

the new candidate solution’s distance from the ideal is proportional to the temperature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the temperature influence?

A

The acceptance of lower values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

SA Alg:

A

Best = Current = {}
Set initial T in the same magnitude as the typical differences between adjacent losses (costs)
t = 1;
while (stopping condition not met){
set NEXT to adjacent state referred to as Current;
cost = cost(NEXT) - cost(Current)
Set Current = NEXT with prob 1/exp(-cost/T)
if cost(Current) < cost(BEST) then Best = Current;
t = t+1
T = T0/log(t+1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What methods are needed for SA?

A

1 Method to generate initial solution
2 Generation function to find neighbours in order to select a NEXT candidate
3 Cost function
4 Eval criterion
5 Stop criterion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

SA steps:

A

1 Generate current solution
2 Eval solution
3 Generate 3 solutions from neighbourhoods
4 let next be best of neighbours
5 if(f(Vc) <f(Vn)) Vc = Vn
5.1 if If f(Vc) >f(Vn) we will evaluate the Metropolis function given by
= e((Vc-Vn)/Temperature)< random(0,1)
6 Update T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

SA adv:

A

Can be applied to large number of problems.
Tuning parameters are easy
Can take time but solutions are usually good
Can find the best solution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Disadv of SA

A

Can take time. Can leave optimal solution and not find it. (Keep track of the best solution)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What principle does Evolutionary Algs follow?

A

survival of the fittest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Generic Evolutionary Alg:

A

Init population with random individuals
Eval each indiv
while(termination condition not met){
Select parents
Genetic manipulation
Evaluate new individuals
Select individuals for next generation}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the key stages in Generic EA?

A

Initial pop generation
Evaluation of every individual in the population
Selection of parents
Application of genetic operators
Evaluation of every individual in the population
Population update (generational/steady state)
Check for the stopping cirteria

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does a fitness function do?

A

Evaluates the suitability of an individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What do Genetic Operators do?

A

Creates offspring of each gen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

GA Alg:

A

Create initial pop
Calc fitness of all individuals
while (termination cond not met){
Select fitter individuals for reproduction
Recombine individuals
Mutate individuals
Eval fitness of all individuals
Generate a new population}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How does Tournament selection work?

A

Select random individuals according to the tournament size.
Select fittest to be parent 1 and the same procedure for parent 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What happens if Tournament selection size is equal to the size of the population vs 1

A

1 = completely random
Size = Only the fittest indiv will become the parents

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are the different crossovers?

A

One-point, Two-point, Uniform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Whats the difference between the types of crossovers?

A

1 point: select 1 point to crossover at. 1st part upto that point is of 1 parent and after that point is of other parent.
2 point: Same but the section between the points are swapped between parents

Uniform: Each bit is considered individually, and a random process decides which parent will be chosen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How does one decide if a bit needs to be mutated?

A

Use a probability alg.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What are the parameters for a GA?

A

Population size
Selection type
Cross over type + rate
Mutation type + rate
Stopping criteria

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What are the advantages of a GA?

A

Easy to understand
modular, separate from application
Support multi-objective optimization
Easy to exploit prev/alt solutions
Flexible building blocks for hybrid applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Study the GA example in TSP- Genetic Algorithm file

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Where does GP search for a solution?

A

In program space?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

GP Alg:

A

Create initial population of programs
Execute each program and establish the fitness.
while(termination condition not met){
Select fitter programs to participate in reproduciton
Create new programs using genetic operators and update the population
Execute each new program and establish the fitness
}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How are genetic programs represented?

A

As Syntax trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What are the tree generation methods in GPs?

A

Full, Grow and Ramped-half and half

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What are the parameters for GP?

A

Initial tree depth
Max tree depth
Pop size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What are some rules for GP.

A

The function set is problem dependent(+-*/)
Terminal set is constants (a,b,c,d)
root and middle nodes obtain values from the function set.
Leaf nodes obtain values from terminal set
Fitness function is problem dependant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What are the Selection methods for GP?

A

Tournament selection
Fitness proportionate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What are the Genetic Operators for GP?

A

Subtree crossover
Grow mutation
Reproduction (move parents into next population)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What are the 2 types of population update in Gp?

A

Steady state: Population is updated in such a manner that one offspring replaces a member of the current population based on fitness
Generational:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What are the 2 termination conditions of GPs?

A

Objective met
Number of generations achieved

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What are some applications of GPs?

A

Symbolic regression
Robotics
Cyber-security
Finance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Basic approach of GP:

A

Create init pop randomly
Each program is constructed from building blocks needed to solve the problem GP is being applied to.
Evaluate the fitness of each program
Select good programs to act as parents
generate new programs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

GP algorithm:

A

Create initial pop
Establish fitness
while (termination condition not met){
select fitter programs to reproduce
Create new programs and update pop
establish fitness}
return best

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

By what is the number of programs specified in GP?

A

The user through a population size parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

How does the full method work in GP?

A

it constructs trees in such a manner that all nodes up to a depth of (max depth -1) are functions and max depth = terminals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

How does the grow method work in GP?

A

Create trees of variable length, Nodes between root and max depth -1 may be random between terminal/function. max depth = terminals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

How does the ramped 1/2-1/2 method work in GP?

A

It combines the full and grow methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

GE alg:

A

Create initial population of variable length binary strings
Map via BNF grammar
eval fitness
while(termination cond not met){
Select fitter indiv for reproduction
Recombine selected individuals
mutate offspring
eval fitness
replace all individuals in the population with offspring}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

How is the initial population generated in GE?

A

Randomly generate a population of variable length binary strings (individuals)
Length is determined randomly from a lower/upper bound
Population size and variable length limits are user specified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

how does mapping work in GE?

A

Production rules needs to be specified
Must contain domain knowledge
Mapping involves converting binary string to decimal and using them to select production rules
Genotype to phenotype

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

How does the mapping equation look in GE?

A

Rule = (codon decimal value)%(No of production rules)

55
Q

What is wrapping in GE?

A

When you reach the end of a sequence of codons before the derivation tree is evolved the procedure continues by looping to the start of the codon sequence

56
Q

How is the fitness of a phenotype evaluated in GE?

A

By applying it to a problem

57
Q

What are the 2 selection type in GE?

A

Tournament or Fitness proportionate

58
Q

What are the different types of Genetic operators?

A

Crossover, mutation, reproduction and elitism

59
Q

What is most common crossover in GE?

A

Single point

60
Q

What does a regression problem do?

A

Seeks to predict a numeric output for a given input

61
Q

What is the only info the target functions has in Symbolic regression?

A

The fitness cases

62
Q

How do GE and GP differ?

A

GP = Syntax trees, GE = chromosomes

63
Q

Where is GE applied?

A

Circuit design,
Image processing
Game AI
Language Processing

64
Q

What is the formula for the probability that an ant will choose a certain path?

A

P = (pheromone * 1/distance)/(sum of all (pheromone * 1/distance)

65
Q

What is the formula for the next pheromone?

A

r = (1-p)current + psum of pheromones

66
Q

What are the parameters of ACO?

A

Number of ants
Pheromone evaporation rate
Pheromone intensity
Heuristic info
Ant decision rule
local search strategy
Termination criteria

67
Q

What are some application of ACO?

A

Routing and transportation
Telecom
Manufacturing and production
Bioinfo
Financial planning
Energy systems

68
Q

PSO alg:

A

Initialize all xi,vi and pbesti values
while (termination condition not met){
for (i = 1, i < N;i++_{
calculate F(xi)
if(F(xi) < F (pbesti)){
pbesti = xi}
if(F(xi) < F (gbesti)){
gbesti = xi}
update all vi and xi values}

69
Q

Velocity update equation of PSO

A

vi(t+1) = wvi(t) + l1r1[pbesti(t)-xi(t)] + l2*r2[pbesti(t)-xi(t)]
l1 + l2 = learning factors
r1 and r2 = random between 0 1

70
Q

Position update equation for PSO

A

xi(t+1) = xi(t) + vi(t+1)

71
Q

What are the parameters for PSO

A

Swarm size
Max velocity
inertia weight
Acceleration coefficients
Neighborhood topology
LS strategy
Termination criteria

72
Q

Applications of PSO

A

Engineering
Finance
Computer science
Machine learning
Renewable energy
Robotics
Health and medicine

73
Q

What does the ID3 Algorithm do?

A

Divides the attributes/features of the dataset into groups of 2. It uses entropy and gain to generate a decision tree.

74
Q

ID3 Algorithm

A

1 Start with entire data set.
2 Choose best attribute (information gain measures how much a specific feature reduces uncertainty)
3 Split the data(based on the chosen attribute’s values, the data is divided into subsets
4 Recursively build the tree.
For each sub set created by 3 repeat 2 and 3 until stopping criteria is met.
(example stopping criteria = max depth/all data points in subset belong to the same class/cant split anymore)
5 handle leaves (if subset isnt perfectly classified but no more features to split on

75
Q

Gain equation:

A

Entropy - SUM((number of instances containing j)/(total number of instances) * (entropy of that entry)

which ever entry gives the highest gain is the root
study ID3.pdf equations if still confused

76
Q

Entropy equation:

A

SUM(-p(i)(log2)p(i))
pi = probability of class i being an instance

77
Q

How is C4.5 different from ID3?

A

It handles continuous features(it finds a threshold that splits the data based on the target variable)
Addresses overfitting by prunning
Use Gain ratio instead of Info Gain

78
Q

C4.5 alg:

A

Start with entire dataset
Choose best Attr (Gain ratio)
Split the data
Recursively build the tree (2 and 3)
Handle leaves by assigning most frequent class (if perfect classification isnt achieved)

79
Q

Gain ratio equation

A

Gain(A)/Split Info(A)

Gain = info gain
Split info = -SUM(pi*log2(pi))

80
Q

How does the K-Means Algorithm look like?

A

Determine number of clusters
Randomly select centroids for each of the clusters
while (not converged){
for(1->number of instances){
for(1->number of klusters){
calc euclidean distance}
add instance to cluster with smallest distance
for(1->number of klusters){
for (1->m){
calc average distance of ith position (x or y)
}
update centroid to the averaged values for each dimension}
}

81
Q

How does the k-mediods alg look like?

A

Initialize (Randomly select initial medoids)
Iteration (Assign data points to closest medoids based on dissimilarity)
medoid reassignment (Check if swapping a data point with the current medoid improves clusters total distance)
Stop if no medoid swaps occur

82
Q

Whats the difference between k-medoids and k-means?

A

K-means = avg of point within a cluster as the centroid
K-medoids = actual data point from cluster as the medoid (more robust to outliers)
K-means = data point must have num values for distance calc
K-medoids = Can work with other dissimilarity measures (more flexible)

83
Q

What is KNN?

A

K-nearest neighbour

84
Q

How does KNN work?

A

Simply stores the labeled training examples during training phase.

85
Q

How does KNN get a prediction?

A

It finds the nearest neighbors of a query point and compute the class label based on the most similar point.

86
Q

How does Ensemble learning work

A

Combines multiple models to achieve better results than any single model alone.

it combines the prediction from the models and chooses based on certain criteria (Max voting, Average, weighted averaging…)

87
Q

How is the hopfield NN trained?

A

By determining a weight matrix

88
Q

What does the weighted matrix represent in HNN?

A

The weights between neurons

89
Q

Is the HNN weighted matrix symmetrical (weights are the same in both directions)?

A

Yes

90
Q

How is the HNN trained?

A

The weight matrix is calculated using matrix multiplication and addition

91
Q

What are the weights on the diagonal of a HNN weights matrix?

A

0

92
Q

How is each weight in HNN calculated if the training data consists of binary values

A

Wij = SUM([2pi -1][2pj -1])
i = row, j = column

93
Q

How is each weight in HNN calculated if the training data consists of bipolar values?

A

Wij = pi*pj
(takes the output product of the input and output vectors)

94
Q

How do we correct a input in HNN?

A

Randomly take a position in the input vector and take the same column in the weighted matrix. multiply the 2 out. if >0 = 0. if <0 = 1 else it stays the same

95
Q

How does the backprop alg look like?

A

Set weights + biases to 0(or small random value)
while(stopping cond not met){
perform feedforward learning
Back prop of error
Update weights}

96
Q

How does feed forward learning work?

A

Calc n1 for each node in the hidden layer
Calc activation function for each node in the hidden layer
calculate n2 for each node in the output layer
Calc activation function for each node in the output layer

97
Q

How does Backpropagation of Error work?

A

Calc error info term for each node in the output layer
Calc weight correction term output layer
Calc bias correction term
Calc sum of delta input for each node in the hidden layer
Calc weight error term for each node in the hidden layer
Calc bias for error term for each node in the hidden layer

98
Q

What is a common activation function used in back prop and whats its derivative for binary?

A

f(n) = 1/(1+exp(-n))
f(n)’ = f(n)(1-f(n))

99
Q

What is the error info term equation?

A

deltak = (tk-f(n2k))*f’(n2k)
k = current node in output layer
2 = output layer
t = target in training instance

100
Q

What is the weight correction term formula?

A

Wik = a* d * f(n1i)

a = learning rate
d = error info term

101
Q

What is the bias correction term equation?

A

w0k = a*d
a = learning rate
d = error info term

102
Q

What is the sum of delta inputs equation?

A

deltani = SUM(d*wik)
d = eror info

103
Q

What does CNN stand for?

A

Convolution Neural Network

104
Q

How are images represented in CNN?

A

Grayscale = single 2D matrix 0-255
Color = 3 stacked 2D matrices 0-255 (each layer = RGB respectively)

105
Q

On what does the size of the color matrix depend on in CNN?

A

The resolution

106
Q

How can we reduce the size of the color matrix in CNN?

A

Pre process the data

107
Q

What are the different types of layers in CNN?

A

Convolutional layer
Nonlinear/ReLU layer
Pooling layer
Fully connected layer

108
Q

Whats the difference between fully connected and sparse connected?

A

Fully = every node is connected to every node
Sparse = Some connected to some

109
Q

What does the CNN architecture look like?

A

Input-> Convolutional->RELU->POOL->flatten(FC)->output(softmax)

109
Q

How connected is the convolution layer?

A

Sparsely connected

110
Q

Why is the convolution layer sparsely connected?

A

Efficiency, locality(neighbors are more similar than the ones far away)

111
Q

What is the core functionality of sparse connectivity?

A

Parameter sharing

112
Q

What are the different convolution operators?

A

Identity and Edge detection

113
Q

Do different convolutions use the same kernels/filters?

A

No

114
Q

How are values for kernels/filters selected?

A

It starts with an initial value and through backpropagation they are changed

115
Q

What is stride in CNN?

A

Number of pixels by which a filter/kernel moves across the input data (during convolution)

116
Q

Whats the difference between a larger stride and a smaller stride?

A

Larger = faster but less detail
Smaller = slower but more detail

Stride > 1 = smaller output feature map

117
Q

What does padding affect in CNN?

A

The output size of the feature maps after a convolution operation (0 padding = adding 0s around the border of the data)

118
Q

Why do we need padding in CNN?

A

To preserve spatial info (without the size of feature map usually shrinks after each convolution due to the filter “sliding off the edge”)

119
Q

How do we determine the feature map in CNN?

A

Decide on kernel size
Determine kernel vals
apply conv to produce the feature map
map kernel accress pixel matrix to produce the feature map

120
Q

What does the nonlinear RELU layer do in CNN?

A

Incorporates nonlinearity
Uses the RELU activation function
f(x) = max(0,x)

121
Q

What does the pooling layer do in CNN?

A

reduces the number of parameters (reduce the size of the feature map

122
Q

What is in the fully connected layer of CNN?

A

the output layer
Last/last 2 layers
uses the softmax activation function
Transfers learning

123
Q

What are the advantages of multitask learning?

A

Improved generalization
Data efficiency
Regularization

124
Q

What are some concepts that underpin deep learning?

A

ANN
Activation funcitons
Gradient descent
Backpropagation
CNN
RNN
Long Short-Term memory
Autoencoders
Generative adversarial Networks (GAN)

125
Q

What is ANN and what is it inspired by?

A

Processing of info and learning through math functions.
inspired by brain

126
Q

What are activation functions and what do they do?

A

Functions that determine how a neuron transforms the received input into an output signal

127
Q

What is Gradient descent?

A

Optimization alg that help the network learn by iteratively adjusting the weights

128
Q

What is CNN and what is it used for?

A

ANN that is used for image recognition and analysis

129
Q

What are RNNs?

A

Recurrent neural network designed to handle sequential data like text or speech

130
Q

What is LSTM and what is it used for?

A

RNN used to learn long term dependencies in sequential data. Used for t=machine translation and speech recognition

131
Q

What do autoencoders do?

A

reconstruct input data at the output layer

They learn compressed representation of the data

132
Q

What is GANs?

A

Uses 2 neural networks(generator and a discriminator)
Generator learn to create new data samples that resemble the training data. Discriminator tries to distinguish real data from generated data.