Exam Flashcards

Question 1

Q

Whats the difference between DFS and BFS?

Answer

A

BFS = guaranteed shortest path from start to goal
DFS = may get lost in search and depth-bound may be imposed
BFS = bad branching factor
DFS = may be long and not optimal
DFS = more efficient for big search spaces (high branching factor)

Question 2

Q

DFS Alg

Answer

A

def dfs (in Start, out State)
open = [Start];
closed = [];
State = failure;
while (open <> []) AND (State <> success)
begin
remove the leftmost state from open, call it X;
if X is the goal, then
State = success
else begin
generate children of X;
put X on closed
eliminate the children of X on open or closed
put remaining children on left end of open
end else
endwhile
return State;
enddef

Question 3

Q

BFS alg

Answer

A

def bfs (in Start, out State)
open = [Start];
closed = [];
State = failure;
while (open <> []) AND (State <> success)
begin
remove the leftmost state from open, call it X;
if X is the goal, then
State = success
else begin
generate children of X;
put X on closed
eliminate the children of X on open or closed
put remaining children on right end of open
end else
endwhile
return State;
enddef

Question 4

Q

How does Best FS work?

Answer

A

Min cost nodes are expanded 1st.
shortest path not guaranteed.
Try to minimize the cost of finding a solution.
Combination of DFS and BFS with heuristics

Question 5

Q

Whats the difference between Hill climbing and BFS?

Answer

A

BFS = global states, Hill climbing = local states

Question 6

Q

What does the HC alg generate?

Answer

A

Partial tree/graph

Question 7

Q

What is the problem with Greedy HC?

Answer

A

can get stuck in local optima

Question 8

Q

What are the 4 steps of local search?

Answer

A

Generate initial solution
Local search
Perturbation
Acceptance cirteria

Question 9

Q

What are the 2 ways to get an initial solution in ILS?

Answer

A

random solution or one returned by greedy construction heuristc
(LS is already available for most solutions)
Perturbation: random move in a neighborhood of higher irder than current can be very effective

Question 10

Q

ILS Alg:

Answer

A

ILS(){
generate initial solution
perform local search
while(stopping cond not met){
perturb;
local search
check acceptance cirterion
}
}

Question 11

Q

ILS Alg 2 (Chapter 2A)

Answer

A

generate init solution
apply LS Alg
Obtain local optimum (local search)
perturb local optimum to obtain new solution
apply LS Alg to new solution
If new solution is better than current solution update current

Question 12

Q

What happens to the temperature in SA?

Answer

A

Temp T starts high and is lowered with each iteration.

Question 13

Q

What happens with a new candidate solution at each iteration?

Answer

A

the new candidate solution’s distance from the ideal is proportional to the temperature

Question 14

Q

What does the temperature influence?

Answer

A

The acceptance of lower values.

Question 15

Q

SA Alg:

Answer

A

Best = Current = {}
Set initial T in the same magnitude as the typical differences between adjacent losses (costs)
t = 1;
while (stopping condition not met){
set NEXT to adjacent state referred to as Current;
cost = cost(NEXT) - cost(Current)
Set Current = NEXT with prob 1/exp(-cost/T)
if cost(Current) < cost(BEST) then Best = Current;
t = t+1
T = T0/log(t+1)

Question 16

Q

What methods are needed for SA?

Answer

A

1 Method to generate initial solution
2 Generation function to find neighbours in order to select a NEXT candidate
3 Cost function
4 Eval criterion
5 Stop criterion

Question 17

Q

SA steps:

Answer

A

1 Generate current solution
2 Eval solution
3 Generate 3 solutions from neighbourhoods
4 let next be best of neighbours
5 if(f(Vc) <f(Vn)) Vc = Vn
5.1 if If f(Vc) >f(Vn) we will evaluate the Metropolis function given by
= e((Vc-Vn)/Temperature)< random(0,1)
6 Update T

Question 18

Q

SA adv:

Answer

A

Can be applied to large number of problems.
Tuning parameters are easy
Can take time but solutions are usually good
Can find the best solution

Question 19

Q

Disadv of SA

Answer

A

Can take time. Can leave optimal solution and not find it. (Keep track of the best solution)

Question 20

Q

What principle does Evolutionary Algs follow?

Answer

A

survival of the fittest

Question 21

Q

Generic Evolutionary Alg:

Answer

A

Init population with random individuals
Eval each indiv
while(termination condition not met){
Select parents
Genetic manipulation
Evaluate new individuals
Select individuals for next generation}

Question 22

Q

What are the key stages in Generic EA?

Answer

A

Initial pop generation
Evaluation of every individual in the population
Selection of parents
Application of genetic operators
Evaluation of every individual in the population
Population update (generational/steady state)
Check for the stopping cirteria

Question 23

Q

What does a fitness function do?

Answer

A

Evaluates the suitability of an individual

Question 24

Q

What do Genetic Operators do?

Answer

A

Creates offspring of each gen

Question 25

Q

GA Alg:

Answer

A

Create initial pop
Calc fitness of all individuals
while (termination cond not met){
Select fitter individuals for reproduction
Recombine individuals
Mutate individuals
Eval fitness of all individuals
Generate a new population}

Question 26

Q

How does Tournament selection work?

Answer

A

Select random individuals according to the tournament size.
Select fittest to be parent 1 and the same procedure for parent 2

Question 27

Q

What happens if Tournament selection size is equal to the size of the population vs 1

Answer

A

1 = completely random
Size = Only the fittest indiv will become the parents

Question 28

Q

What are the different crossovers?

Answer

A

One-point, Two-point, Uniform

Question 29

Q

Whats the difference between the types of crossovers?

Answer

A

1 point: select 1 point to crossover at. 1st part upto that point is of 1 parent and after that point is of other parent.
2 point: Same but the section between the points are swapped between parents

Uniform: Each bit is considered individually, and a random process decides which parent will be chosen

Question 30

Q

How does one decide if a bit needs to be mutated?

Answer

A

Use a probability alg.

Question 31

Q

What are the parameters for a GA?

Answer

A

Population size
Selection type
Cross over type + rate
Mutation type + rate
Stopping criteria

Question 32

Q

What are the advantages of a GA?

Answer

A

Easy to understand
modular, separate from application
Support multi-objective optimization
Easy to exploit prev/alt solutions
Flexible building blocks for hybrid applications

Question 33

Q

Study the GA example in TSP- Genetic Algorithm file

Question 34

Q

Where does GP search for a solution?

Answer

A

In program space?

Question 35

Q

GP Alg:

Answer

A

Create initial population of programs
Execute each program and establish the fitness.
while(termination condition not met){
Select fitter programs to participate in reproduciton
Create new programs using genetic operators and update the population
Execute each new program and establish the fitness
}

Question 36

Q

How are genetic programs represented?

Answer

A

As Syntax trees

Question 37

Q

What are the tree generation methods in GPs?

Answer

A

Full, Grow and Ramped-half and half

Question 38

Q

What are the parameters for GP?

Answer

A

Initial tree depth
Max tree depth
Pop size

Question 39

Q

What are some rules for GP.

Answer

A

The function set is problem dependent(+-*/)
Terminal set is constants (a,b,c,d)
root and middle nodes obtain values from the function set.
Leaf nodes obtain values from terminal set
Fitness function is problem dependant

Question 40

Q

What are the Selection methods for GP?

Answer

A

Tournament selection
Fitness proportionate

Question 41

Q

What are the Genetic Operators for GP?

Answer

A

Subtree crossover
Grow mutation
Reproduction (move parents into next population)

Question 42

Q

What are the 2 types of population update in Gp?

Answer

A

Steady state: Population is updated in such a manner that one offspring replaces a member of the current population based on fitness
Generational:

Question 43

Q

What are the 2 termination conditions of GPs?

Answer

A

Objective met
Number of generations achieved

Question 44

Q

What are some applications of GPs?

Answer

A

Symbolic regression
Robotics
Cyber-security
Finance

Question 45

Q

Basic approach of GP:

Answer

A

Create init pop randomly
Each program is constructed from building blocks needed to solve the problem GP is being applied to.
Evaluate the fitness of each program
Select good programs to act as parents
generate new programs

Question 46

Q

GP algorithm:

Answer

A

Create initial pop
Establish fitness
while (termination condition not met){
select fitter programs to reproduce
Create new programs and update pop
establish fitness}
return best

Question 47

Q

By what is the number of programs specified in GP?

Answer

A

The user through a population size parameter

Question 48

Q

How does the full method work in GP?

Answer

A

it constructs trees in such a manner that all nodes up to a depth of (max depth -1) are functions and max depth = terminals

Question 49

Q

How does the grow method work in GP?

Answer

A

Create trees of variable length, Nodes between root and max depth -1 may be random between terminal/function. max depth = terminals

Question 50

Q

How does the ramped 1/2-1/2 method work in GP?

Answer

A

It combines the full and grow methods

Question 51

Q

GE alg:

Answer

A

Create initial population of variable length binary strings
Map via BNF grammar
eval fitness
while(termination cond not met){
Select fitter indiv for reproduction
Recombine selected individuals
mutate offspring
eval fitness
replace all individuals in the population with offspring}

Question 52

Q

How is the initial population generated in GE?

Answer

A

Randomly generate a population of variable length binary strings (individuals)
Length is determined randomly from a lower/upper bound
Population size and variable length limits are user specified

Question 53

Q

how does mapping work in GE?

Answer

A

Production rules needs to be specified
Must contain domain knowledge
Mapping involves converting binary string to decimal and using them to select production rules
Genotype to phenotype

Question 54

Q

How does the mapping equation look in GE?

Answer

A

Rule = (codon decimal value)%(No of production rules)

Question 55

Q

What is wrapping in GE?

Answer

A

When you reach the end of a sequence of codons before the derivation tree is evolved the procedure continues by looping to the start of the codon sequence

Question 56

Q

How is the fitness of a phenotype evaluated in GE?

Answer

A

By applying it to a problem

Question 57

Q

What are the 2 selection type in GE?

Answer

A

Tournament or Fitness proportionate

Question 58

Q

What are the different types of Genetic operators?

Answer

A

Crossover, mutation, reproduction and elitism

Question 59

Q

What is most common crossover in GE?

Answer

A

Single point

Question 60

Q

What does a regression problem do?

Answer

A

Seeks to predict a numeric output for a given input

Question 61

Q

What is the only info the target functions has in Symbolic regression?

Answer

A

The fitness cases

Question 62

Q

How do GE and GP differ?

Answer

A

GP = Syntax trees, GE = chromosomes

Question 63

Q

Where is GE applied?

Answer

A

Circuit design,
Image processing
Game AI
Language Processing

Question 64

Q

What is the formula for the probability that an ant will choose a certain path?

Answer

A

P = (pheromone * 1/distance)/(sum of all (pheromone * 1/distance)

Answer 64

A

r = (1-p)current + psum of pheromones

Answer 65

A

Number of ants
Pheromone evaporation rate
Pheromone intensity
Heuristic info
Ant decision rule
local search strategy
Termination criteria

Answer 66

A

Routing and transportation
Telecom
Manufacturing and production
Bioinfo
Financial planning
Energy systems

Answer 67

A

Initialize all xi,vi and pbesti values
while (termination condition not met){
for (i = 1, i < N;i++_{
calculate F(xi)
if(F(xi) < F (pbesti)){
pbesti = xi}
if(F(xi) < F (gbesti)){
gbesti = xi}
update all vi and xi values}

Answer 68

A

vi(t+1) = wvi(t) + l1r1[pbesti(t)-xi(t)] + l2*r2[pbesti(t)-xi(t)]
l1 + l2 = learning factors
r1 and r2 = random between 0 1

Answer 69

A

xi(t+1) = xi(t) + vi(t+1)

Answer 70

A

Swarm size
Max velocity
inertia weight
Acceleration coefficients
Neighborhood topology
LS strategy
Termination criteria

Answer 71

A

Engineering
Finance
Computer science
Machine learning
Renewable energy
Robotics
Health and medicine

Answer 72

A

Divides the attributes/features of the dataset into groups of 2. It uses entropy and gain to generate a decision tree.

Answer 73

A

1 Start with entire data set.
2 Choose best attribute (information gain measures how much a specific feature reduces uncertainty)
3 Split the data(based on the chosen attribute’s values, the data is divided into subsets
4 Recursively build the tree.
For each sub set created by 3 repeat 2 and 3 until stopping criteria is met.
(example stopping criteria = max depth/all data points in subset belong to the same class/cant split anymore)
5 handle leaves (if subset isnt perfectly classified but no more features to split on

Answer 74

A

Entropy - SUM((number of instances containing j)/(total number of instances) * (entropy of that entry)

which ever entry gives the highest gain is the root
study ID3.pdf equations if still confused

Answer 75

A

SUM(-p(i)(log2)p(i))
pi = probability of class i being an instance

Answer 76

A

It handles continuous features(it finds a threshold that splits the data based on the target variable)
Addresses overfitting by prunning
Use Gain ratio instead of Info Gain

Answer 77

A

Start with entire dataset
Choose best Attr (Gain ratio)
Split the data
Recursively build the tree (2 and 3)
Handle leaves by assigning most frequent class (if perfect classification isnt achieved)

Answer 78

A

Gain(A)/Split Info(A)

Gain = info gain
Split info = -SUM(pi*log2(pi))

Answer 79

A

Determine number of clusters
Randomly select centroids for each of the clusters
while (not converged){
for(1->number of instances){
for(1->number of klusters){
calc euclidean distance}
add instance to cluster with smallest distance
for(1->number of klusters){
for (1->m){
calc average distance of ith position (x or y)
}
update centroid to the averaged values for each dimension}
}

Answer 80

A

Initialize (Randomly select initial medoids)
Iteration (Assign data points to closest medoids based on dissimilarity)
medoid reassignment (Check if swapping a data point with the current medoid improves clusters total distance)
Stop if no medoid swaps occur

Answer 81

A

K-means = avg of point within a cluster as the centroid
K-medoids = actual data point from cluster as the medoid (more robust to outliers)
K-means = data point must have num values for distance calc
K-medoids = Can work with other dissimilarity measures (more flexible)

Answer 82

A

K-nearest neighbour

Answer 83

A

Simply stores the labeled training examples during training phase.

Answer 84

A

It finds the nearest neighbors of a query point and compute the class label based on the most similar point.

Answer 85

A

Combines multiple models to achieve better results than any single model alone.

it combines the prediction from the models and chooses based on certain criteria (Max voting, Average, weighted averaging…)

Answer 86

A

By determining a weight matrix

Answer 87

A

The weights between neurons

Answer 88

A

The weight matrix is calculated using matrix multiplication and addition

Answer 89

A

Wij = SUM([2pi -1][2pj -1])
i = row, j = column

Answer 90

A

Wij = pi*pj
(takes the output product of the input and output vectors)

Answer 91

A

Randomly take a position in the input vector and take the same column in the weighted matrix. multiply the 2 out. if >0 = 0. if <0 = 1 else it stays the same

Answer 92

A

Set weights + biases to 0(or small random value)
while(stopping cond not met){
perform feedforward learning
Back prop of error
Update weights}

Answer 93

A

Calc n1 for each node in the hidden layer
Calc activation function for each node in the hidden layer
calculate n2 for each node in the output layer
Calc activation function for each node in the output layer

Answer 94

A

Calc error info term for each node in the output layer
Calc weight correction term output layer
Calc bias correction term
Calc sum of delta input for each node in the hidden layer
Calc weight error term for each node in the hidden layer
Calc bias for error term for each node in the hidden layer

Answer 95

A

f(n) = 1/(1+exp(-n))
f(n)’ = f(n)(1-f(n))

Answer 96

A

deltak = (tk-f(n2k))*f’(n2k)
k = current node in output layer
2 = output layer
t = target in training instance

Answer 97

A

Wik = a* d * f(n1i)

a = learning rate
d = error info term

Answer 98

A

w0k = a*d
a = learning rate
d = error info term

Answer 99

A

deltani = SUM(d*wik)
d = eror info

Answer 100

A

Convolution Neural Network

Answer 101

A

Grayscale = single 2D matrix 0-255
Color = 3 stacked 2D matrices 0-255 (each layer = RGB respectively)

Answer 102

A

The resolution

Answer 103

A

Pre process the data

Answer 104

A

Convolutional layer
Nonlinear/ReLU layer
Pooling layer
Fully connected layer

Answer 105

A

Fully = every node is connected to every node
Sparse = Some connected to some

Answer 106

A

Input-> Convolutional->RELU->POOL->flatten(FC)->output(softmax)

Answer 107

A

Sparsely connected

Answer 108

A

Efficiency, locality(neighbors are more similar than the ones far away)

Answer 109

A

Parameter sharing

Answer 110

A

Identity and Edge detection

Answer 111

A

It starts with an initial value and through backpropagation they are changed

Answer 112

A

Number of pixels by which a filter/kernel moves across the input data (during convolution)

Answer 113

A

Larger = faster but less detail
Smaller = slower but more detail

Stride > 1 = smaller output feature map

Answer 114

A

The output size of the feature maps after a convolution operation (0 padding = adding 0s around the border of the data)

Answer 115

A

To preserve spatial info (without the size of feature map usually shrinks after each convolution due to the filter “sliding off the edge”)

Answer 116

A

Decide on kernel size
Determine kernel vals
apply conv to produce the feature map
map kernel accress pixel matrix to produce the feature map

Answer 117

A

Incorporates nonlinearity
Uses the RELU activation function
f(x) = max(0,x)

Answer 118

A

reduces the number of parameters (reduce the size of the feature map

Answer 119

A

the output layer
Last/last 2 layers
uses the softmax activation function
Transfers learning

Answer 120

A

Improved generalization
Data efficiency
Regularization

Answer 121

A

ANN
Activation funcitons
Gradient descent
Backpropagation
CNN
RNN
Long Short-Term memory
Autoencoders
Generative adversarial Networks (GAN)

Answer 122

A

Processing of info and learning through math functions.
inspired by brain

Answer 123

A

Functions that determine how a neuron transforms the received input into an output signal

Answer 124

A

Optimization alg that help the network learn by iteratively adjusting the weights

Answer 125

A

ANN that is used for image recognition and analysis

Answer 126

A

Recurrent neural network designed to handle sequential data like text or speech

Answer 127

A

RNN used to learn long term dependencies in sequential data. Used for t=machine translation and speech recognition

Answer 128

A

reconstruct input data at the output layer

They learn compressed representation of the data

Answer 129

A

Uses 2 neural networks(generator and a discriminator)
Generator learn to create new data samples that resemble the training data. Discriminator tries to distinguish real data from generated data.

Brainscape's Knowledge GenomeTM

Exam Flashcards

Brainscape's Knowledge Genome^TM