AI Flashcards

Question

What are the 3 Quantities used to measure the Performance?

Answer 1

- Branching Factor (b): Max No. of successor of any Node - Depth (d): Depth of Shallowest goal Node - Max Length (m): MAX length of any path in state space

Answer 2

Completeness=>Guarantees finding solution, if 1? Optimality =>Capability of finding Optimal Solution Time Complexity => How long to find solution Space Complexity => Memory space required?

Answer 3

Complete: Yes, if goal node is at some finite d and will find goal node. b must also be finite Optimal: Yes, if path cost is a non-decreasing function of the depth of the node (ALL ACTIONS HAVE SAME COST) Time: O(b^(d)) Assume Uniform Tree, each node has b successors Space: O(b^(d)) Stores all Expanded nodes Frontier is O(b^(d)) AND in Memory it is O(b^(d-1))

Answer 4

I = Initial State (agent starts its search) A=Action Set (Actions that can be executed in any state) T = Transition Model (Mapping between States and Actions ) G = Goal Test (Determine if State is a Goal State) P= Path Cost Function (Assigns values to each cost)

Answer 5

Sequence of Actions from initial state to Goal

Answer 6

Sum of the cost of actions from initial to goal

Answer 7

Sequence of states connected by a sequence of actions

Answer 8

-An Uninformed Searching Strategy ==> Expanding the deepest unexpanded node in Frontier Stack LIFO is used for expansion Most recently generated node is expanded - Usually just do the left most node

Answer 9

-Expand the Root first - Then expand the first successor of root node (CAN DO IT RANDOMLY) - Repeat Expanding deepest node until goal is found otherwise go back to try alternative paths

Answer 10

Completeness: Not complete if search space is Infinite or we don't avoid infinite loops. Yes, only is search space is finite Optimality: Not Optimal, because it will expand entire left subtree, even if goal is first level of subtree Time: O(b^m) Depends on the Max length of the path is Search space Space: O(b^m) Store all the nodes from each path from Root to leaf

Answer 11

-Less Memory Usage (LMU) - Depth LIMITED Search (DLS) - DLS with LMU b) Optimality is ALWAYS NO

Answer 12

If u reach the leaf node for the left-subtree and there is no GOAL node u can remove the entire subtree from memory Space Complexity become O(bm) - Store a single path with siblings for each node on the path - COMPLETE IF the search space is finite

Answer 13

- has a DEPTH LIMIT L once limit is reached we go find an alternate path if Ld it is not optimal Time complexity reduces to O(b^L) , when L

Answer 14

Once we reach our DEPTH LIMIT L and if we have found NO goal node we remove the subtree from our memory Space Complexity = O(bL) COMPLETE if L => d

Answer 15

Use problem-specific knowledge beyond problem definition Can find solutions more efficiently compared to uniformed

Answer 16

Best-First search: -Determines which node to expand based on an evaluation function f(n). f(n) - acts as cost estimate => Lowest cost expanded first

Answer 17

-Determines which node to expand based on an evaluation function f(n). Most include a heuristic h(n), which is the estimated cost of the cheapest path from current node to goal goal node h(n) = 0 Known as GREEDY as will always pick cheapest path

Answer 18

f(n) = g(n)+h(n) g(n)=> cost to reach node n h(n) => heuristic from n to goal

Answer 19

-Expand the node in the frontier with smallest f(n) - Repeated States & Loopy Paths. - Node already in frontier don't add - If the state of a given child is in the Frontier: -if frontier node has large g(n), replace child & remove node with the larger g(n) - Stop when goal is visited

Answer 20

A* is Complete & Optimal if h(n) is consistent - A* is exponential in the length of the solution CONSTANT STEPS COSTS : O(b^(Σd)) h* is actual cost from root to goal, Σ = (h*-h)/h* Σ is RELATIVE ERROR Space: O(b^d) is Main Issue with A* -Keeps all generated nodes in memory - Keep ALL expanded & ALL nodes in frontier -NOT suitable for LARGE SCALE PROBLEMS

Answer 21

If the estimate is no greater than the estimated distance from any neighbouring node n' to the goal, + the cost of reaching the neighbour: h(n) <= cost(n, n') + h(n') cost is just distance from n to n' (g(n))

Answer 22

-Candidate solutions of a Problem -Are variables belonging to pre-defined domains -There may be one or more design variable in a given optimisation problem. - These can be possible decision needed to be made in the problem

Answer 23

-Takes design variables as an Input - Outputs a NUMERICAL value that problem aims to MININMISE OR MAXIMISE CAN have Multiple Objective functions in Formulation Defines the Cost or Quality of a Solution

Answer 24

-Constraints that design variables must satisfy for the solution to be FEASIBLE -Usually depicted by functions that take the design variables as input and output a numeric value. -They specify the values that these functions are allowed to take for the solution to be feasible. -There may be zero or more constraints in a problem DEFINES THE FEASIBILITY OF THE SOLUTION

Answer 25

DO NOT keep track of the paths or States that have been visited NOT systematic ,but PROS are: - Use very little Memory - Find Reasonable solutions in Large or Infinite State Space

Answer 26

Optimisation Algorithms that operate by searching from initial state to neighbouring states

Answer 27

Reaching the HIGHEST PEAK/ Global Maximum

Answer 28

Reaching the LOWEST TROUGH/ Global Minimum

Answer 29

TO find & Reach Global Maximum

Answer 30

TO find & Reach Global Minimum

Answer 31

Does not look beyond the immediate neighbours of the current state

Answer 32

-Representation -Initialisation Procedure -Neighbourhood Operator

Answer 33

How to store Design variables in the problem(s) Should facilitate the Application of the Initialisation Procedure

Answer 34

How to pick initial solution. USUALLY RANDOM, Can Be Heuristic

Answer 35

How to generate Neighbourhood Solutions (INCREMENT/STEP SIZE)

Answer 36

Completeness: No, Depends on problem formulation & design of the algorithms (GET STUCK ON LOCAL MINIMA) OPTIMALITY: Not Optimal, (GET STUCK ON LOCAL MINIMA) Time: O(mnp) m = MAX no. of iteration, n = MAX no. of neighbours, EACH take O(p) to generate Space: O(nq+r) ==> r is a constant so ==> O(nq) n = MAX no. of neighbours, Variable takes O(q) and r represents the space to generate the neighbours sequentially(NEGLIGIBLE COMPARED TO n & q)

Answer 37

-Supervised Learning, -Unsupervised Learning -Reinforcement Learning

Answer 38

- Most Prevalent Form - Learning with a teacher - Teacher: expected output, label, class, etc - Solve 2 Types of problems: Classification & Regression

Answer 39

- Automatically create models from data to perform certain tasks through machine learning - Not Guaranteed perfect model, but will find good model depending on difficulty of problem - Good for problem where it is difficult to create good models manually - Good for problems that don't require perfect answers

Answer 40

- Solve them in a reasonable amount of time through optimisation techniques - No guarantee to find optimal solution in reasonable amount of time but a good solution - Good for problems where no specific technique exists that guarantees that optimal solution can be found

Answer 41

-Can machine think humanly? - Can we consider machine as human? - How to define think humanly? ==> With AI we need to define things with mathematical forms

Answer 42

- What can humans do? - What if human's action is wrong? ==> Doesn't mean we shouldn't copy the wrong actions

Answer 43

==> Rationality: Doing the right thing can be mathematically defined & General enough, linked to human behaviour

Answer 44

- Think Logically? - Logical AI? - Too Narrow?

Answer 45

Rational Agents are computer programs that perceive their environments & take actions that maximize their chances of achieving best EXPECTED outcome

Answer 46

An agent is learning if it improves its performance after making observations about the world. Machine Learning when the agent is a Computer

Answer 47

Problems that require a model to be built automatically from data e.g => to make classification

Answer 48

- Most popular form in Real World - Learning with a Teacher - Teacher: expected output, labels, classes, etc. - Solve 2 types of Problems : Classification & Regression Problem

Answer 49

- Predict categorical class Labels ==> Spam Detection

Answer 50

- Prediction of a Real Value ==> Student Grades, Stock Price Prediction

Answer 51

Agents Observe input-output pairs & learns a function that maps from input to output

Answer 52

Agent Learns patterns in the input without any explicit feedback

Answer 53

- Learning without a teacher - Find Hidden Structures Clustering ==> Group inputs based on similar properties

Answer 54

- COMBO of Supervised & Unsupervised - Learning with (delayed) feedback / reward --> Don't have instant labels - Learn series of actions => Sequence of decision Making Agents learn from a series of Reinforcement Rewards & Punishment. Decides which of the actions prior to reinforcement were most responsible for it and alter actions towards more reward in Future.

Answer 55

- Fitting it too well is not helpful as the data you want to classify or predict is not the same as the training data - Learning ever irrelevant detail in training data is irrelevant - Occurs when model is more complex than required

Answer 56

when model is more Simpler than required IS BAD CAUSE will Not classify all point into correct classes

Answer 57

Parametric models are learning models that summarise data with a set of parameters. e.g Logistic & Linear Regression

Answer 58

Non-parametric models are learning models that do not assume any parameters e.g KNN

Answer 59

- Outliers can influence the clusters that are found & increase WCSSS - Problem when Clusters are differing

Answer 60

key tool in image and Signal Compression (SPECIALLY VECTOR QUANTIZATION)

Answer 61

- Pick k random elements & these are our centroids - Assign each element to its closest centroid (EUCLIDEAN DISTANCE IF NOT SPECIFIED) - Recalculate each centroid's centre & repeat until centroid don't change an you get Min WCSS

Answer 62

Use Prior Info like no of group we want to cluster

Answer 63

Runs K means with different K values See where WCSS has inflection point (KINK IN GRAPH BEFORE IT LEVELS OFF)

Answer 64

Finds the optimal K value

Answer 65

-Bottom UP - Each item starts as it's own Cluster - Merge the 2 Similar (SMALLEST INTER CLUSTER DISSIMILARITY) clusters in each step, until there is one cluster

Answer 66

-TOP down - Each items starts in ONE cluster - Splits Cluster into 2 New cluster with Largest inter-cluster dissimilarity, until each item has its own cluster

Answer 67

- If it has outliers, which heavily affect cluster being merged, but they are not representative of the whole cluster

Answer 68

- Cause a 'chaining effect' where clusters are combined at close intermediate examples - Cluster may not be as compact as required

Answer 69

- Provide clusters with small diameter - Cluster may end up being crowded, as items can be very close to items in other clusters

Answer 70

Attempt to produce a relatively compact cluster, which are relatively close

Answer 71

Methods that ensure Optimisation Algorithms can effectively search the feasible regions. Avoiding or Penalising INFEASIBLE solutions by Modifying the Algorithm Operator OR Objective Function

Answer 72

- Modifying the Search Space to generate ONLY feasible Solutions - Algorithm Operators is a function that generates a new candidate solution based on current solution - AVOIDS generating Invalid Solutions, where constraint dictate whether solution fits

Answer 73

- Will not generate Infeasible Solutions, allowing search for optimal solutions - Makes Hill Climbing & Simulated Annealing Complete

Answer 74

- May be difficult to design, Problem-Dependent - Restricts the search space too much harder to find Optimal Solutions. (GLOBAL OPTIMUM MAYBE BETWEEN FEASIBLE & NON FEASIBLE SOLUTIONS)

Answer 75

- Incorporating Constraints into Objective Function, often adding a penalty term for constraint violation OBJECTIVE FUNCTION WOULD BE INCREASED/DECREASED IF CONSTRAINT VIOLATED

Answer 76

A method of Modifying the Objective Function MIN PROBLEM Let x be a Solution Objective function is f(x) + Q(x) // Q(x) is penalty term - if x is FEASIBLE Q(x) = 0 - Else Q(x) is a LARGE +VE constant C // So that feasible solutions are smaller than infeasible ones

Answer 77

C is Always the same Constant, so INFEASIBLE SOLUTION will have same Penalty - Hard to find solutions in a dominated Region of Infeasible solutions

Answer 78

A method of Modifying the Objective Function - This distinguishes the objective value of infeasible solution to help the algorithm find Feasible ones. Objective function still gets a Penalty added/subbed to it

Answer 79

THE PROBLEM IS MINIMISE f(x)+Q(x) Q(x) = vg_1*C*g_1(x) +vg_2*C*g_2(x) +...+vg_n*C*g_n(x)+ vh_1*C*h_1(x) +vh_2*C*h_2(x) +...+vh_n*C*h_n(x) i = 1,...,n so if g_i(x) / h_i(x) is VIOLATED, then vg_i/vh_i is 1 OTHERWISE vg_i/vh_i is 0 More Violations for each g_i & h_i then higher output of Q(x) Only adds Penalties corresponding to Violated Constraints

Answer 80

The Constraints

Answer 81

A Constant, that represents how important a specific constraints is. e.g=> say vg_1*C*g_1(x) is more important than vg_2*C*g_2(x), so C will be different values (SCALING EACH CONSTRAINT BY HOW IMPORTANT IT IS)

Answer 82

As it Solves the issue with with Death Penalty, As it Gives Different Penalties for ALL Infeasible solutions. -Smaller for Solutions that are close to constraints. -Larger for Solutions that are Further away from the constraint

Answer 83

- Make distinction between objective values even larger. - Distinguish Different infeasible solutions more effectively - Removes -VEs - Bigger Span of Solutions - Effectively Distinguish

Answer 84

Easy to Desgin

Answer 85

-Algorithms have to search for Feasible solution to design rather than just having the search space only certain feasible solution -(STILL GENERATES SOLUTIONS BUT THEY GET IGNORED DUE TO LARGE/SMALL OBJECTIVE FUNCTION IF WE ARE MIN/MAX) -(THEY WILL JUST BE A NEIGHBOURS NOT A SOLUTION WE MOVE TOO)

Answer 86

If the Strategy never enables infeasible solution to be generated at all

Answer 87

Modifying Algorithm Operator can as it does not Generate Infeasible Solutions

Answer 88

HIGH LEVEL free parameters

Answer 89

By training the free parameters of the considered model using available data.

Answer 90

Estimate the free parameters

Answer 91

Evaluate the performance of the trained predictor before deploying it Reserved for the Evaluation of the predictor, so can't use it as the model learn nothing

Answer 92

1) Randomly Choose 30% of data to form a Validation set 2) Remaining data forms the training set 3) Training your model on the training set 4) Estimate the test performance on the validation set 5) Choose the Model with lowest Validation Error 6) Retrain with chosen model on joined training validation to obtain predictor 7) Estimate future performance of the obtained predictor on TEST SET 8) Ready to deploy the Predictor

Answer 93

(f(x)-y)^2

Answer 94

- Linear Model - Quadratic Model - Line Model (OVERFITTING)

Answer 95

REGRESSION : we compute the cost Function (L2) on the examples of the validation set (INSTEAD OF TRAINING SET) CLASSIFICATION: Don't compute cross entropy cost on the validation set COMPUTE the 0-1 error metric 0 -1 error Matrix = (num of wrong Predictions)/num of predictions = 1- accuracy

Answer 96

1) Split the training set randomly into k (equal sized) disjoint sets 2) Use K-1 of those together for training 3) Use the remaining one for validation 4) Permute the k sets & repeat k times (CHANGE THE PARTITONS OF VALIDATION SETS & REPEAT k TIMES) EACH PARTITION WILL BE USED AS A VALIDATION SET 5) Average the Performances on the k validation sets 6) Choose the model with smallest AVG k fold cross validation error 7) Re-train with chosen model on joined training & validation to obtain the predictor 8)Estimate future performance of the obtained predictor on TEST SET 9) Ready to deploy the Predictor

Answer 97

- Leave out a single example & train on all the rest of the annotated data - For a total of N example, we repeat this N times leaving out a single example - Take Avg. of the validation errors as measured on the left-out points - Same as N-folds cross validation where N is the number of labelled points (EVERY POINT WILL BE USED AS A VALIDATION POINT)

Answer 98

For each of the partition create separate lines/graphs & Then compute validation error using the points in each partition - Then take the mean of these ERRORS for each of the graphs

Answer 99

Computationally Cheapest (IDEAL FOR LARGE SAMPLES)

Answer 100

Not reliable if sample size is not large enough

Answer 101

Slightly more reliable than Hold out

Answer 102

Only waste 10% - Fairly reliable

Answer 103

- Wastes 1/3-rd annotated data - Computationally 3-times as expensive as holdout

Answer 104

- Wastes annotated data - Computationally 10-times as expensive as holdout

Answer 105

Doesn't waste data (IDEAL FOR SMALL SAMPLES)

Answer 106

Computationally most Expensive

Answer 107

O(n^2) - Storing the distance matrix require s storage (n^2)/2 entries (Don't need to store b to a if a to b is already there)

Answer 108

O(n^3) - n iterations - Every iteration the n^2 size distance matrix has to be updated and searched (GOING THROUGH ONCE IS n^2 & THROUGH n TIMES is n^3 ) - Complexity can be reduced to O(n^2 log(n)) if using different algorithms

Answer 109

-Limits the size of the dataset that can be processed - Anything more than n^2/n^3 is hard to work on NOT IDEAL FOR LARGE DATABASES

Answer 110

-VERSATILE: classification & regression & Non parametric - Easy to Implement & Interpret - Can Approx. complex functions, so it has very good Accuracy - Instance based so it defers all calculations until end

Answer 111

- Performance decreases as dimensionality increase - Sensitive to noise / INACCURATE DATA especially when the value of k is small - Specify the distance function & pre-define k value - ALL training data it needs to calculate the value of new points based on training data only - Computationally Expensive as dataset grow

Answer 112

Relatively low computational cost - Don't worry about K value - Less Space required does not have to store the training set

Answer 113

- Supervised Learning Technique - Takes Inputs and plots them like Linear Regression - Classifies the outputs into discrete distinct category using sigmoid function

Answer 114

1/(1-e^-u) where u is the line or function w0 + w1*x1

Answer 115

Pairs don't have same cluster & class

Answer 116

Pairs have same cluster, but not class

Answer 117

Pairs don't have same cluster , but same class

Answer 118

Pairs have same cluster & class

Answer 119

The cost function is NOT non-decreasing

Answer 120

- Creates a line in 2D space relating x values to predictors - w_0 is the y intercept/offset - w_1 is the gradient

Answer 121

Compares difference between actual label data & model's prediction - squares to eliminate -VES - Done for every element in training set/example - SUMS value for cost function value

Answer 122

Takes in ONE x value - Uses it raised to different powers to get a polynomial function - THUS NON LINEAR

Answer 123

For examples/data points it create an nxn matrix Then compares whether each point is in same Cluster giving 1 if they are or otherwise 0

Answer 124

For examples/data points it create an nxn matrix Then compares whether each point is in same Class giving 1 if they are or otherwise 0

Answer 125

Take the Average y_i for all K neighbours (VALUE U ARE CLASSIFIYING IN FOR NEW DATA) The Average value is the y_i value for new data e.g: we want to find the height of new data point p, - Find closet k neighbours - Find Average height of those neighbour - the Average is the height of p

Answer 126

Just look at the y_i values for all k neighbours, whichever is more prevalent that is the value of our new data point's y_i e.g we want to find the height of new data point p, - Find closet k neighbours - Look at the height for the neighbours - Pick the one that shows up the most in the Neighbours

Answer 127

First pick random initial centroid Then find the centroid furthest away using the via finding EUC distance, so picking the next point the probability is proportional to distance ^2. Then repeat till you have k centroid Then do k means to find clusters

Answer 128

Different initialisation lead to different local optima

Answer 129

Algorithm may not converge at global minimum, rather to a local minima

Answer 130

- WCSS monotoically decreases - works with finite number of partitions of data

AI Flashcards

(156 cards)