ai Flashcards
what is an agent in a search problem?
An independent program or entity that interacts with its environment by perceiving its surroundings via sensors, then acting through actuators or effectors
what does it mean for a search problem to be observable?
The Agent is familiar with the complete state of the environment at a given time
what does it mean for a search problem to be discrete?
There are a finite number of actions for any state in the environment.
what does it mean for a search problem to be known?
the environment has been explored, the best ways of solving it are known and the constraints and criteria are well-defined. the agent can determine which states are reached by which action
what does it mean for a search problem to be deterministic?
Each action has exactly one outcome. The problem can be solved in polynomial time.
what are the main components of a search problem?
Initial state, Actions, Transition model (which actions take one state to another), Goal test (is a state the goal or not) and Path cost
what 4 characteristics help us evaluate the performance of a search algorithm?
completeness (if there is a solution, the algorithm will always find it), optimality (if there are many solutions, the algorithm will always find the best one), time complexity, space complexity
what are 2 slightly better variations of depth-first search?
DFS- Less memory usage:
delete fully explored subtrees from memory in runtime
DFS- Depth-limited Search:
only search the top n-layers of the tree
what is best-first search?
an informed search algorithm that checks the node of lowest cost next using an evaluation function ( f(n) node -> cost estimate ). the function will contain a heuristic which is an estimate of how close the node is to the goal state.
what is a greedy best-first search?
a best-first search algorithm where f(n) = h(n). the node with the lowest heuristic function will always be checked next.
what is A* search algorithm?
a best-first search algorithm that uses sets that store nodes that need to be explored (initially just the starting node)/ have been explored.
While the open set is not empty:
select node of lowest cost from the open set (current node) and move it to the closed set. If current node is the goal, exit loop. generate new non-closed neighbours by putting them in the open set and calculating the cost. If they’re already in the open set, update if the cost is lower. If open set becomes empty before goal, there is no path. otherwise trace back to find path
what is the evaluation function for the A* algorithm?
the A* algorithm uses the evaluation function f(n) = g(n)+h(n) where g(n) is the cost to reach the node and h(n) is the heuristic from the node to the goal.
How do you formulate an optimisation problem?
design variable: a variable that represents an attempt at optimising the problem. for example, a list representing a path in a shortest path algorithm
Objective function: a function that inputs the candidate solution (objective function) and outputs the cost/ quality of it
constraints: functions that must be in a range for the solution to be valid such as f(x)<100 where f(x) calculates distance and 100 is the maximum distance that a lorry driver can travel in a day
what is hill climbing?
an optimisation algorithm where the starting solution is random. then possible solutions are checked that are close to the starting solution. if any of them are higher quality, take that as the next current solution. keep repeating until the current solution is the best out of its neighbours.
what is the disadvantage of hill climbing?
the algorithm may discover a local maximum rather than the global maximum solution. also can get stuck on plateaus.
what is a greedy algorithm?
A greedy algorithm selects the best available option at each decision point without considering the overall future consequences or examining the entire search space.
what is simulated annealing?
A hill-climb variation that compares a random neighbour to the current solution and has a probability to make it the new current solution even if it is lower quality. this probability decreases as time goes on by making it equal to e^ (ΔE/T) where ΔE is the quality of the neighbour - that of the current solution. T is temperature and decreases each iteration by a factor such as 0.95. terminates when minimum T is reached current solution stops changing.
what must be defined in a hill-climbing problem?
Representation and Initialisation: define what form the candidate solution comes in and the boundaries that it is set by
Define Neighbours: give an algorithm that randomly generates a neighbour that still follows the constraints
how does the gradient descent function work with a function of only one parameter f(x)?
the function finds the local minimum point on a function f(x) by taking a point x0 and x and finding the gradient. Then if the gradient is positive, the next x value x1 is an amount lower than 0 equal to the gradient * a constant called learning rate. If the gradient was negative, x1 would move in the opposite direction. This is repeated until the step size (learning rate*gradient) gets too small.
how does the gradient descent function work with functions of more than one parameter f(x,z,…)?
the gradient is a vector [x,z,…] that can be calculated by differentiating the function with respect to each variable x,z,… creating a vector [ ∂f(x,z,…)/∂x, ∂f(x,z,…)/∂z,…) so for example the gradient vector for f(x,z) = x^4-3z would be [4x^3, -3] and the new point (x1,z1,f(x1,z1)) would be (x0,z0,f(x0,z0)) - α[4x^3, -3]
where α is the learning rate
how do you find an optimal linear regression line on bivariate data?
the loss function that will be minimised is the sum of the square of residuals. this is equal to Σ(actual-calculated)^2 = Σ(y-(mx+c))^2. This can be mapped out on a 3 dimensional graph where m and c are on the x and z axes and the sum of the square of residuals is on the y-axis. Now the gradient vector must be found by differentiating the SSR with respect to m and c. Then m and c can be set initially as 1 and 0 for example and refined using the gradient vector and a learning rate until the step size gets small enough where the refinement is complete.
what is the sigmoid function?
1/(1+e^-x) where y is an unspecified line in the same number of dimensions as the data. for example if the data is 3D vectors, x would be a+bx1+cx2^2. the function outputs a number from 0 to 1 to calculate the likelihood of the data having a certain characteristic (>0.5, we predict that the input is 1 otherwise we predict 0). this can also be used for logistic regression where 0 represents one group and 1 represents another such as if the data is tumours 0 is benign and 1 is cancerous. the points would be tumours that are known to be cancerous or not and would be mapped on either x=1 or x=0.