AI Flashcards

1
Q

Cons of Hill Climbing:

A
  • No Guarantee to be COMPLETE or OPTIMAL
  • Can get stuck on local maxima & plateaus
    (RUN FOREVER IF NOT PROPERLY FORMULATED)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Pros of Hill Climbing:

A
  • Rapidly find good solutions by Improving over bad initial state (GREEDY)
  • Lower Time & Space Complexity compared to search algorithms
  • No Requirements of problem-specific heuristics
    (UNINFORMED)
  • Start from Candidate Solution, instead of building up step-by-step
    (UNLIKELY BUT POSSIBLE, SOLUTION PICKED AT RANDOM )

Run algorithm for a Maximum number of iterations m

Variants of Hill climbing can sort out getting stuck on local maxima/ plateaus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Variants of Hill Climbing:

A
  • Stochastic Hill Climbing
  • First - Choice Hill Climbing
  • Random - Restart Hill Climbing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Stochastic Hill Climbing :

A
  • Variants Randomly selects a neighbour that involves an uphill move
  • Probability of picking a specific move can depend on the steepness
  • Converges slower than steepest ascent but can find higher solutions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

First - Choice Hill Climbing

A

-Randomly generates a single SUCCESSOR neighbour solution & moves to it if it’s better than current solution

  • No UPHILL, keeps randomly generating solutions until there is an uphill move
  • After MAX num of tries OR generating all neighbours, hasn’t found UPHILL move, it gives up & assumes that it’s now at Optimal solution.
  • Time complexity of this is lower as not all neighbours need to be generates before 1 is picked
    (GOOD WHEN THERE ARE LOADS OF NEIGHBOURS FOR EACH SOLUTIONS)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Random - Restart Hill Climbing:

A
  • Generates a series of different hill climbing searches of the same problem from random to initial states
  • Stops when goal is found
  • Can be parallelised / Threaded easily so does not take much time on modern Computers
  • RARE to have to wait for this to happen
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the only Complete variant of Hill Climbing?

A

Random - Restart Hill Climbing
It generates a solution if one exists, as eventually random start solution will be OPTIMAL SOLUTION

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Pros of Simulating Annealing:

A
  • Find near Optimal Solutions in reasonable time
    (High Chance of going to GLOBAL MAX, But only
    Good as the formulation of Optimisation Problem)
  • Avoids getting stuck in poor LOCAL MAX & PLATEAU by combining EXPLORATION & EXLIOTATION
    (many solutions we concurrently work with at the end of EXPLORATION )
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cons of Simulating Annealing:

A
  • Not Guaranteed to be complete OR OPTIMAL
    (SENSITIVE TO FORMUALTION)
  • NOT reliable = can’t guarantee completeness
  • Time & Space Complexity is problem & Representation- dependent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Formulating Optimisation Problems:

A

min / max f(x) => min/max objective functions
s.t g_i(x) <= 0, i = 1,…,m
h_i(x) <= 0, i = 1,…,n => Feasibility Constraint

==> x is the DESIGN VARAIBLE (can be anything)

SEARCH SPACE is space of all possible x values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Search Space of a Problem:

A

is the space of all possible x values ALLOWED by constraints in CANDIDATE SOLUTIONS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define Explicit Constraint:

A

Explicitly mentioned
CANNOT BE ASSUMED

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define Implicit Constraint:

A

Rules that must be in place by the problem definitions in order for solutions to be CONSIDERED FEASIBLE

e.g: x,y > 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define A*:

A

Heuristic path finding algorithm & most used example of Informed Search

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the Evaluation Function:

A

f(n) = g(n) + h(n)
g(n) cost to reach node n
h(heuristic from node n to goal)

It determines which node to expand next

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Steps of A*:

BASIC

A

1) Expand the shallowest node in frontier with SMALLEST EVALUATION FUNCTION

2) Node is in the list of visited nodes, DON’T ADD TO FRONTIER

3) Stop when a GOAL node is visited
(KEEP EXPANDING OTHERWISE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the Final Cost of A*?

A

Sum of g(n) along the paths

  • DON’T INCLUDE HEURSITICS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Dijkstra’s:

A
  • UNINFORMED
  • Goes Through all the nodes of the graph
  • No Heuristics / ADDITIONAL INFO
  • Basic Attempt to find shortest distance from the nodes on the graph to the destination
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is h(n)?

A

Euclidean Distance from each node to destination

Heuristics measure of cost as if a node is physically closer to destination

Good Assumption the cost to the destination will be lower

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Pros & Cons of A* :

A

Pros:
- Doesn’t go through all nodes

  • safer to assume to expand node with lower straight line distance (LEAVE OUT OTHER NODES )

Cons:
- Can lead to non-optimal paths (UNLIKELY)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Uninformed Search ?

A

Define the set of strategies having no additional information about the State Space beyond what is give during Problem formulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does Uninformed Search do?

A

-Only generates successors
-Distinguish a goal state from a non goal-state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Breadth First Search ?

A

-An Uninformed Searching Strategy

FRONTIER NODES ARE EXPANDED LAYER BY LAYER

Expanding shallowest unexpanded node in frontier

Similar to a QUEUE (FIFO)

  • Unless State otherwise, never add children that are already in the Frontier
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Steps for BFS?

A

-Root node is expanded
- Then the successors of the root
- Repeatedly expand all the successors of the all children, till goal node reached

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What are the 3 Quantities used to measure the Performance?
- Branching Factor (b): Max No. of successor of any Node - Depth (d): Depth of Shallowest goal Node - Max Length (m): MAX length of any path in state space
26
4 quantities that Determine the Performance?
Completeness=>Guarantees finding solution, if 1? Optimality =>Capability of finding Optimal Solution Time Complexity => How long to find solution Space Complexity => Memory space required?
27
Performance of BFS
Complete: Yes, if goal node is at some finite d and will find goal node. b must also be finite Optimal: Yes, if path cost is a non-decreasing function of the depth of the node (ALL ACTIONS HAVE SAME COST) Time: O(b^(d)) Assume Uniform Tree, each node has b successors Space: O(b^(d)) Stores all Expanded nodes Frontier is O(b^(d)) AND in Memory it is O(b^(d-1))
28
What does I Always Take Goal Path stand for?
I = Initial State (agent starts its search) A=Action Set (Actions that can be executed in any state) T = Transition Model (Mapping between States and Actions ) G = Goal Test (Determine if State is a Goal State) P= Path Cost Function (Assigns values to each cost)
29
What is the Solution?
Sequence of Actions from initial state to Goal
30
What is Cost of Solution?
Sum of the cost of actions from initial to goal
31
What is the Path?
Sequence of states connected by a sequence of actions
32
Does goal node appear in Order of nodes visited in BFS?
NO
33
Does goal node appear in Order of nodes visited in DFS?
YES
34
What is Depth First search?
-An Uninformed Searching Strategy ==> Expanding the deepest unexpanded node in Frontier Stack LIFO is used for expansion Most recently generated node is expanded - Usually just do the left most node
35
Steps for DFS?
-Expand the Root first - Then expand the first successor of root node (CAN DO IT RANDOMLY) - Repeat Expanding deepest node until goal is found otherwise go back to try alternative paths
36
Performance of DFS?
Completeness: Not complete if search space is Infinite or we don't avoid infinite loops. Yes, only is search space is finite Optimality: Not Optimal, because it will expand entire left subtree, even if goal is first level of subtree Time: O(b^m) Depends on the Max length of the path is Search space Space: O(b^m) Store all the nodes from each path from Root to leaf
37
Variants of DFS: b) what performance attribute stays the same for all of these variants?
-Less Memory Usage (LMU) - Depth LIMITED Search (DLS) - DLS with LMU b) Optimality is ALWAYS NO
38
What is LMU?
If u reach the leaf node for the left-subtree and there is no GOAL node u can remove the entire subtree from memory Space Complexity become O(bm) - Store a single path with siblings for each node on the path - COMPLETE IF the search space is finite
39
What is DLS?
- has a DEPTH LIMIT L once limit is reached we go find an alternate path if Ld it is not optimal Time complexity reduces to O(b^L) , when L
40
What is DLS with LMU?
Once we reach our DEPTH LIMIT L and if we have found NO goal node we remove the subtree from our memory Space Complexity = O(bL) COMPLETE if L => d
41
What are Informed Search Strategies?
Use problem-specific knowledge beyond problem definition Can find solutions more efficiently compared to uniformed
42
General Approach for Informed?
Best-First search: -Determines which node to expand based on an evaluation function f(n). f(n) - acts as cost estimate => Lowest cost expanded first
43
What is Best First Search?
-Determines which node to expand based on an evaluation function f(n). Most include a heuristic h(n), which is the estimated cost of the cheapest path from current node to goal goal node h(n) = 0 Known as GREEDY as will always pick cheapest path
44
What is the f(n) for A*?
f(n) = g(n)+h(n) g(n)=> cost to reach node n h(n) => heuristic from n to goal
45
Steps for A*? | COMPLEX
-Expand the node in the frontier with smallest f(n) - Repeated States & Loopy Paths. - Node already in frontier don't add - If the state of a given child is in the Frontier: -if frontier node has large g(n), replace child & remove node with the larger g(n) - Stop when goal is visited
46
Performance of A*?
A* is Complete & Optimal if h(n) is consistent - A* is exponential in the length of the solution CONSTANT STEPS COSTS : O(b^(Σd)) h* is actual cost from root to goal, Σ = (h*-h)/h* Σ is RELATIVE ERROR Space: O(b^d) is Main Issue with A* -Keeps all generated nodes in memory - Keep ALL expanded & ALL nodes in frontier -NOT suitable for LARGE SCALE PROBLEMS
47
What does Consistent mean in terms of h(n)?
If the estimate is no greater than the estimated distance from any neighbouring node n' to the goal, + the cost of reaching the neighbour: h(n) <= cost(n, n') + h(n') cost is just distance from n to n' (g(n))
48
What do design variables represent?
-Candidate solutions of a Problem -Are variables belonging to pre-defined domains -There may be one or more design variable in a given optimisation problem. - These can be possible decision needed to be made in the problem
49
What is the objective function?
-Takes design variables as an Input - Outputs a NUMERICAL value that problem aims to MININMISE OR MAXIMISE CAN have Multiple Objective functions in Formulation Defines the Cost or Quality of a Solution
50
What are Constraints in Formulating Operation Problems?
-Constraints that design variables must satisfy for the solution to be FEASIBLE -Usually depicted by functions that take the design variables as input and output a numeric value. -They specify the values that these functions are allowed to take for the solution to be feasible. -There may be zero or more constraints in a problem DEFINES THE FEASIBILITY OF THE SOLUTION
51
Benefits of Local Search?
DO NOT keep track of the paths or States that have been visited NOT systematic ,but PROS are: - Use very little Memory - Find Reasonable solutions in Large or Infinite State Space
52
What are Local Search Algorithms?
Optimisation Algorithms that operate by searching from initial state to neighbouring states
53
What is The Aim Of a MAXIMISING problem?
Reaching the HIGHEST PEAK/ Global Maximum
54
What is The Aim Of a MINIMISING problem?
Reaching the LOWEST TROUGH/ Global Minimum
55
Purpose Of Hill Climbing?
TO find & Reach Global Maximum
56
Purpose of Gradient Descent
TO find & Reach Global Minimum
57
Why is Hill Climbing Greedy?
Does not look beyond the immediate neighbours of the current state
58
What are the 3 Components of Hill Climbing we must design?
-Representation -Initialisation Procedure -Neighbourhood Operator
59
What is Representation?
How to store Design variables in the problem(s) Should facilitate the Application of the Initialisation Procedure
60
What is Initialisation Procedure?
How to pick initial solution. USUALLY RANDOM, Can Be Heuristic
61
What is Neighbourhood Operator?
How to generate Neighbourhood Solutions (INCREMENT/STEP SIZE)
62
Performance of Hill Climbing:
Completeness: No, Depends on problem formulation & design of the algorithms (GET STUCK ON LOCAL MINIMA) OPTIMALITY: Not Optimal, (GET STUCK ON LOCAL MINIMA) Time: O(mnp) m = MAX no. of iteration, n = MAX no. of neighbours, EACH take O(p) to generate Space: O(nq+r) ==> r is a constant so ==> O(nq) n = MAX no. of neighbours, Variable takes O(q) and r represents the space to generate the neighbours sequentially(NEGLIGIBLE COMPARED TO n & q)
63
What are the 3 Types of Machine Learning?
-Supervised Learning, -Unsupervised Learning -Reinforcement Learning
64
What is Supervised Learning?
- Most Prevalent Form - Learning with a teacher - Teacher: expected output, label, class, etc - Solve 2 Types of problems: Classification & Regression
65
How can AI be used to Solve Machine Learning Problem?
- Automatically create models from data to perform certain tasks through machine learning - Not Guaranteed perfect model, but will find good model depending on difficulty of problem - Good for problem where it is difficult to create good models manually - Good for problems that don't require perfect answers
66
AI in Optimisation Problems:
- Solve them in a reasonable amount of time through optimisation techniques - No guarantee to find optimal solution in reasonable amount of time but a good solution - Good for problems where no specific technique exists that guarantees that optimal solution can be found
67
Think Humanly?
-Can machine think humanly? - Can we consider machine as human? - How to define think humanly? ==> With AI we need to define things with mathematical forms
68
Act Humanly?
- What can humans do? - What if human's action is wrong? ==> Doesn't mean we shouldn't copy the wrong actions
69
Act Rationally?
==> Rationality: Doing the right thing can be mathematically defined & General enough, linked to human behaviour
70
Think Rationally?
- Think Logically? - Logical AI? - Too Narrow?
71
Define AI using 'Rational Agents':
Rational Agents are computer programs that perceive their environments & take actions that maximize their chances of achieving best EXPECTED outcome
72
What is Machine Learning?
An agent is learning if it improves its performance after making observations about the world. Machine Learning when the agent is a Computer
73
What is Machine Learning Problem?
Problems that require a model to be built automatically from data e.g => to make classification
74
What is Supervised Learning?
- Most popular form in Real World - Learning with a Teacher - Teacher: expected output, labels, classes, etc. - Solve 2 types of Problems : Classification & Regression Problem
75
What is Classification Problem? {Give example}
- Predict categorical class Labels ==> Spam Detection
76
What is Regression Problem? {Give example}
- Prediction of a Real Value ==> Student Grades, Stock Price Prediction
77
Define Supervised Learning:
Agents Observe input-output pairs & learns a function that maps from input to output
78
Define Unsupervised Learning:
Agent Learns patterns in the input without any explicit feedback
79
What is Unsupervised Learning?
- Learning without a teacher - Find Hidden Structures Clustering ==> Group inputs based on similar properties
80
What is Reinforcement Learning?
- COMBO of Supervised & Unsupervised - Learning with (delayed) feedback / reward --> Don't have instant labels - Learn series of actions => Sequence of decision Making Agents learn from a series of Reinforcement Rewards & Punishment. Decides which of the actions prior to reinforcement were most responsible for it and alter actions towards more reward in Future.
81
Why is Overfitting a Problem?
- Fitting it too well is not helpful as the data you want to classify or predict is not the same as the training data - Learning ever irrelevant detail in training data is irrelevant - Occurs when model is more complex than required
82
When does Underfitting occur?
when model is more Simpler than required IS BAD CAUSE will Not classify all point into correct classes
83
What is a Parametric Model?
Parametric models are learning models that summarise data with a set of parameters. e.g Logistic & Linear Regression
84
What is a NON-Parametric Model?
Non-parametric models are learning models that do not assume any parameters e.g KNN
85
Limitations of K means?
- Outliers can influence the clusters that are found & increase WCSSS - Problem when Clusters are differing
86
Applications of K means?
key tool in image and Signal Compression (SPECIALLY VECTOR QUANTIZATION)
87
K means Steps?
- Pick k random elements & these are our centroids - Assign each element to its closest centroid (EUCLIDEAN DISTANCE IF NOT SPECIFIED) - Recalculate each centroid's centre & repeat until centroid don't change an you get Min WCSS
88
Selecting K in K means?
Use Prior Info like no of group we want to cluster
89
How to do Elbow Method?
Runs K means with different K values See where WCSS has inflection point (KINK IN GRAPH BEFORE IT LEVELS OFF)
90
What is Elbow Method?
Finds the optimal K value
91
What is Agglomerative Clustering?
-Bottom UP - Each item starts as it's own Cluster - Merge the 2 Similar (SMALLEST INTER CLUSTER DISSIMILARITY) clusters in each step, until there is one cluster
92
What is Divisive Clustering?
-TOP down - Each items starts in ONE cluster - Splits Cluster into 2 New cluster with Largest inter-cluster dissimilarity, until each item has its own cluster
93
Advantage Single Linkage & Complete linkage?
- If it has outliers, which heavily affect cluster being merged, but they are not representative of the whole cluster
94
Cons of SL?
- Cause a 'chaining effect' where clusters are combined at close intermediate examples - Cluster may not be as compact as required
95
Cons of CL?
- Provide clusters with small diameter - Cluster may end up being crowded, as items can be very close to items in other clusters
96
Pro of group average?
Attempt to produce a relatively compact cluster, which are relatively close
97
What is Constraint Handling?
Methods that ensure Optimisation Algorithms can effectively search the feasible regions. Avoiding or Penalising INFEASIBLE solutions by Modifying the Algorithm Operator OR Objective Function
98
What is Modifying the Algorithm Operator?
- Modifying the Search Space to generate ONLY feasible Solutions - Algorithm Operators is a function that generates a new candidate solution based on current solution - AVOIDS generating Invalid Solutions, where constraint dictate whether solution fits
99
Pros of Modifying the Algorithm Operator
- Will not generate Infeasible Solutions, allowing search for optimal solutions - Makes Hill Climbing & Simulated Annealing Complete
100
Cons of Modifying the Algorithm Operator
- May be difficult to design, Problem-Dependent - Restricts the search space too much harder to find Optimal Solutions. (GLOBAL OPTIMUM MAYBE BETWEEN FEASIBLE & NON FEASIBLE SOLUTIONS)
101
What is Modifying the Objective Function?
- Incorporating Constraints into Objective Function, often adding a penalty term for constraint violation OBJECTIVE FUNCTION WOULD BE INCREASED/DECREASED IF CONSTRAINT VIOLATED
102
What is the Death Penalty Method?
A method of Modifying the Objective Function MIN PROBLEM Let x be a Solution Objective function is f(x) + Q(x) // Q(x) is penalty term - if x is FEASIBLE Q(x) = 0 - Else Q(x) is a LARGE +VE constant C // So that feasible solutions are smaller than infeasible ones
103
What's a Limitation/Problem with Death Penalty Method
C is Always the same Constant, so INFEASIBLE SOLUTION will have same Penalty - Hard to find solutions in a dominated Region of Infeasible solutions
104
What is Levels of Infeasibility Approach?
A method of Modifying the Objective Function - This distinguishes the objective value of infeasible solution to help the algorithm find Feasible ones. Objective function still gets a Penalty added/subbed to it
105
Give an Example of Q(x) for Levels of Infeasibility Approach?
THE PROBLEM IS MINIMISE f(x)+Q(x) Q(x) = vg_1*C*g_1(x) +vg_2*C*g_2(x) +...+vg_n*C*g_n(x)+ vh_1*C*h_1(x) +vh_2*C*h_2(x) +...+vh_n*C*h_n(x) i = 1,...,n so if g_i(x) / h_i(x) is VIOLATED, then vg_i/vh_i is 1 OTHERWISE vg_i/vh_i is 0 More Violations for each g_i & h_i then higher output of Q(x) Only adds Penalties corresponding to Violated Constraints
106
What does g_i & h_i in Levels of Infeasibility Approach?
The Constraints
107
What does C in Levels of Infeasibility Approach?
A Constant, that represents how important a specific constraints is. e.g=> say vg_1*C*g_1(x) is more important than vg_2*C*g_2(x), so C will be different values (SCALING EACH CONSTRAINT BY HOW IMPORTANT IT IS)
108
Why is Levels of Infeasibility better than Death Penalty?
As it Solves the issue with with Death Penalty, As it Gives Different Penalties for ALL Infeasible solutions. -Smaller for Solutions that are close to constraints. -Larger for Solutions that are Further away from the constraint
109
What does Squaring the values of the Constraints in the Penalty term?
- Make distinction between objective values even larger. - Distinguish Different infeasible solutions more effectively - Removes -VEs - Bigger Span of Solutions - Effectively Distinguish
110
Pros of Levels of Infeasibility?
Easy to Desgin
111
Cons of Levels of Infeasibility?
-Algorithms have to search for Feasible solution to design rather than just having the search space only certain feasible solution -(STILL GENERATES SOLUTIONS BUT THEY GET IGNORED DUE TO LARGE/SMALL OBJECTIVE FUNCTION IF WE ARE MIN/MAX) -(THEY WILL JUST BE A NEIGHBOURS NOT A SOLUTION WE MOVE TOO)
112
How do Hill Climbing & Simulating Annealing become Complete?
If the Strategy never enables infeasible solution to be generated at all
113
Which of the CONSTRAINT HANDLING methods makes Hill Climbing & Simulating Annealing COMPLETE?
Modifying Algorithm Operator can as it does not Generate Infeasible Solutions
114
What are the Hyperparameters?
HIGH LEVEL free parameters
115
How is a Predictor Obtained?
By training the free parameters of the considered model using available data.
116
What is the Training set used for?
Estimate the free parameters
117
What is the Test set used for?
Evaluate the performance of the trained predictor before deploying it Reserved for the Evaluation of the predictor, so can't use it as the model learn nothing
118
Holdout Validation Steps?
1) Randomly Choose 30% of data to form a Validation set 2) Remaining data forms the training set 3) Training your model on the training set 4) Estimate the test performance on the validation set 5) Choose the Model with lowest Validation Error 6) Retrain with chosen model on joined training validation to obtain predictor 7) Estimate future performance of the obtained predictor on TEST SET 8) Ready to deploy the Predictor
119
What is the Mean Square Error (L2)?
(f(x)-y)^2
120
What are the 3 Models?
- Linear Model - Quadratic Model - Line Model (OVERFITTING)
121
For Holdout Validation how do we ESTIMATE THE TEST PERFORMANCE ON THE VALIDATION SET?
REGRESSION : we compute the cost Function (L2) on the examples of the validation set (INSTEAD OF TRAINING SET) CLASSIFICATION: Don't compute cross entropy cost on the validation set COMPUTE the 0-1 error metric 0 -1 error Matrix = (num of wrong Predictions)/num of predictions = 1- accuracy
122
Steps for k-Fold Cross-Validation?
1) Split the training set randomly into k (equal sized) disjoint sets 2) Use K-1 of those together for training 3) Use the remaining one for validation 4) Permute the k sets & repeat k times (CHANGE THE PARTITONS OF VALIDATION SETS & REPEAT k TIMES) EACH PARTITION WILL BE USED AS A VALIDATION SET 5) Average the Performances on the k validation sets 6) Choose the model with smallest AVG k fold cross validation error 7) Re-train with chosen model on joined training & validation to obtain the predictor 8)Estimate future performance of the obtained predictor on TEST SET 9) Ready to deploy the Predictor
123
Steps for Leave-one-out Validation
- Leave out a single example & train on all the rest of the annotated data - For a total of N example, we repeat this N times leaving out a single example - Take Avg. of the validation errors as measured on the left-out points - Same as N-folds cross validation where N is the number of labelled points (EVERY POINT WILL BE USED AS A VALIDATION POINT)
124
What do we have to do with the models for k-fold Cross Validation?
For each of the partition create separate lines/graphs & Then compute validation error using the points in each partition - Then take the mean of these ERRORS for each of the graphs
125
Pros of Holdout Validation?
Computationally Cheapest (IDEAL FOR LARGE SAMPLES)
126
Cons of Holdout Validation?
Not reliable if sample size is not large enough
127
Pros of 3 fold?
Slightly more reliable than Hold out
128
Pros of 10-fold?
Only waste 10% - Fairly reliable
129
Cons of 3-fold?
- Wastes 1/3-rd annotated data - Computationally 3-times as expensive as holdout
130
Cons of 10-fold?
- Wastes annotated data - Computationally 10-times as expensive as holdout
131
Pros of Leave-one-out?
Doesn't waste data (IDEAL FOR SMALL SAMPLES)
132
Cons of Leave-one-out?
Computationally most Expensive
133
What is Storage Complexity of Dendrogram?
O(n^2) - Storing the distance matrix require s storage (n^2)/2 entries (Don't need to store b to a if a to b is already there)
134
What is the Time Complexity of Dendrogram?
O(n^3) - n iterations - Every iteration the n^2 size distance matrix has to be updated and searched (GOING THROUGH ONCE IS n^2 & THROUGH n TIMES is n^3 ) - Complexity can be reduced to O(n^2 log(n)) if using different algorithms
135
Downside of Dendrogram, regarding Complexity ?
-Limits the size of the dataset that can be processed - Anything more than n^2/n^3 is hard to work on NOT IDEAL FOR LARGE DATABASES
136
Pros of KNN?
-VERSATILE: classification & regression & Non parametric - Easy to Implement & Interpret - Can Approx. complex functions, so it has very good Accuracy - Instance based so it defers all calculations until end
137
Cons of KNN?
- Performance decreases as dimensionality increase - Sensitive to noise / INACCURATE DATA especially when the value of k is small - Specify the distance function & pre-define k value - ALL training data it needs to calculate the value of new points based on training data only - Computationally Expensive as dataset grow
138
Pros of Linear Regression?
Relatively low computational cost - Don't worry about K value - Less Space required does not have to store the training set
139
Logistic Regression ?
- Supervised Learning Technique - Takes Inputs and plots them like Linear Regression - Classifies the outputs into discrete distinct category using sigmoid function
140
What is sigmoid function?
1/(1-e^-u) where u is the line or function w0 + w1*x1
141
What is f00
Pairs don't have same cluster & class
142
what is f10
Pairs have same cluster, but not class
143
what is f01
Pairs don't have same cluster , but same class
144
What is f11
Pairs have same cluster & class
145
BFS is not Optimal when?
The cost function is NOT non-decreasing
146
Explain hypothesis function y = w_0 + w_1*x
- Creates a line in 2D space relating x values to predictors - w_0 is the y intercept/offset - w_1 is the gradient
147
Explain Cost Function : L(x,y) = Σ (y_i - h(x_i))^2
Compares difference between actual label data & model's prediction - squares to eliminate -VES - Done for every element in training set/example - SUMS value for cost function value
148
Explain the Hypothesis Function k : y = w_0+ w_1*x + w_2*x^2
Takes in ONE x value - Uses it raised to different powers to get a polynomial function - THUS NON LINEAR
149
What is an ideal Cluster Similarity matrix?
For examples/data points it create an nxn matrix Then compares whether each point is in same Cluster giving 1 if they are or otherwise 0
150
What is an ideal Class Similarity matrix?
For examples/data points it create an nxn matrix Then compares whether each point is in same Class giving 1 if they are or otherwise 0
151
How do we Classify in KNN for Regression Problems ?
Take the Average y_i for all K neighbours (VALUE U ARE CLASSIFIYING IN FOR NEW DATA) The Average value is the y_i value for new data e.g: we want to find the height of new data point p, - Find closet k neighbours - Find Average height of those neighbour - the Average is the height of p
152
How do we Classify in KNN for Classification Problems ?
Just look at the y_i values for all k neighbours, whichever is more prevalent that is the value of our new data point's y_i e.g we want to find the height of new data point p, - Find closet k neighbours - Look at the height for the neighbours - Pick the one that shows up the most in the Neighbours
153
What is k means ++
First pick random initial centroid Then find the centroid furthest away using the via finding EUC distance, so picking the next point the probability is proportional to distance ^2. Then repeat till you have k centroid Then do k means to find clusters
154
How is k means non deterministic?
Different initialisation lead to different local optima
155
If optimisation problem in k means is not convex ?
Algorithm may not converge at global minimum, rather to a local minima
156
2 things that make k mean converge
- WCSS monotoically decreases - works with finite number of partitions of data