350 AI Flashcards
Name a few game AI Objectives
1) Aiding human players by delegating cumbersome tasks to AI system
E.g., Move groups of units from X to Y along shortest path,
attack town with large number of units from multiple angles)
2) Create challenging adversaries and collaborators
* Strategic AI (e.g., where to build what when?
* Tactical AU (e.g., small-scale combat)
* Push the state-of-the-art of AI in general, challenging human, experts.
* Can help designers to balance game parameters and find bugs
3) Learning: Ability to improve through gameplay experience
What are the advantages and disadvantages of scripted AI systems?
Advantages:
* Does not require lots of CPU time ( important because gfx still gets majority of cycles )
* can mimic expert behaviour directly
Disadvantages:
* predictable
* cannot deal well with unforseen events
Describe some map representations commonly used in video games.
1) Grid Based Methods: Divide the map into a grid of squares of hexagons.
- Advantages:
1. Conceptually simple representation
2. Local changes have local effects which is well suited for dynamic environments
3. Perfectly represents tile-based environments
4. Exact paths easy to determine - Disadvantages:
1. Imprecise representation of arbitrary barriers
2. Increased precision in one area increases complexity everywhere and potentially causing a large memory footprint
3. Awkward for objects that are not tile-szied and shaped
4. Need to post-process paths if environment allows arbitrary motion angles ( or tweak A*)
2) Polygon-Based Maps: Use polygons to represent various areas and features. Then find path between two points that does not cross constraints
Describe advantages and disadvantages of polygon-based map representations.
- Advantages:
1. Arbitrary polygon obstacles
2. Arbitrary motion angles
3. Memory efficient
4. Finding Optimal paths for circular objects is not hard
5. Topological abstractions - Disadvantages:
1. Complex code
2. Robustness issues
3. Point localization no longer takes constant time
3) Node and Graph Maps: Utilize nodes and edges to represent locations and paths
Explain the concept of visibility graphs and their properties w.r.t path finding
Place nodes at corners of obstacles and then place edges between nodes that are visibile to each other. To find the path from one point to another:
* add the nodes to a graph, connecting to visible nodes.
* run pathfinding algorithm on resulting graph.
* Advantages:
1. Path provably optimal
* disadvantages:
1. Adding and changing world can be expensive as graph can be dense.
2. since every node can potentially “see” every other, leading to quadratic space and time requirements.
3. Connecting the start/end nodes to the graph can already take linear time.
4. Corner-corner visibility test is not cheap.
not optimal
Explain what free-space decompositions are
Free-space decomposition is when we decompose empty areas into simple convex shapes and create way point graph by placing nodes on unconstrained edges and in the face interior, if needed. The nodes are connected according to their direct reachability. Then we can find path between points such as A, B and locate faces in which A, B reside. Then connect A,B to all face nodes. Lastly, find a path and then smooth the path.
Give a formal definition of directed and undirected graphs
An undirected graph is a pair G = (V,E) that is composed of a non-empty vertex ( or node) set V and an edge set E, such that each edge contains one or two vertices, i.e. for all e ∈ E, e ⊆ V and 1 ≤ |e| ≤ 2. The number of neighbours of v is called degree (deg(v)).
A directed graph is a pair G = (V,E) that is composed of a non-empty vertex set V and an edge set E ⊆ V × V. Edge e = (u,v) connects vertex u with vertex v. deg+(v) : out-degree, deg−(v) : in-degree
Describe the following algorithms and their purpose: BFS
BFS is a graph traversal algorithm that explores all neighbours at one depth level before moving to the next. It is implemented using a queue and is used for shortest paths in unweighted graphs and level-order tree traversal.
For G = (V, E) the runtime of BFS is Θ(|V| + |E|) because in the worst
case every node is visited once in step (1) and all edges are visited
twice in step (2). BFS uses Θ(|V|) space
This is good news. BFS’s runtime is asymptotically optimal, because
the least we have to do is accessing all input information
Dijkstra’s
Dijkstra’s Algorithm
- Purpose: Finds the shortest path from a single source to all other nodes in a graph.
-
Process:
- Iteratively expands the “reached set” S_i by selecting the closest unvisited node u with minimal distance d[u] .
- Updates distances of u ‘s neighbors to include potentially shorter paths through u .
-
Runtime Complexity:
- With Array: Θ(|V|^2) ) where ( |V| is the number of nodes (optimal for dense graphs, where (|E|∈ Θ(|V|^2) ).
- With Priority Queue: O(|E| log |E|) ) (better for sparse graphs, where ( |E| ∈ O(|V|)), leading to ( O(|V| log |V|) ).
-
Key Points:
- Suitable for weighted graphs with non-negative edge weights.
- Priority queue enhances efficiency, especially in sparse graphs.
A*
Main Loop:
▶ remove lowest-f node n from OPEN
▶ n goal? yes => stop
▶ move n to CLOSED
▶ expand n: consider its children
▶ as far as we know, we are done with this node (but can be re-opened
later)
Consider a child:
▶ check if state seen before (in OPEN or CLOSED):
▶ if state has been seen with the same or smaller g value, reject
▶ Otherwise, remove child state from OPEN and CLOSED and add
corresponding node to OPEN for consideration
The OPEN/CLOSED lists act as a cache of previously seen results.
This algorithm is used for efficient path finding
UCS
- Frontier Priority Queue:
* Uses a priority queue to manage the frontier - nodes that have been seen but not expanded.
* Nodes are prioritized by their path cost from the start node, with the lowest cumulative cost nodes being selected first. - Cost function:
* UCS solely relies on the cost function g(n), which represents the total cost from the start node to node n along the best known path. - Path Expansion:
* The algorithm selects the node with the lowest g(n) from the priority queue and expands it, examining its adjacent nodes(successors)
* If a newly discovered node has not been seen before, or a cheaper path to it is found, it is added to the priority queue with its corresponding g(n) value.
4.Goal test:
* Upon each node expansion, UCS checks if the goal has been reached. If the goal is reached, UCS guarantees that the path found is the least-cost path due to the order of node expansion.
* The search stops when the goal node is selected for expansion, not when it is first seen.
UCS is ideal for scenarios where the path cost is a critical factor, and the goal is to minimize this cost. Used in route planning, network routing and scenarious where consistent path costs are applied.
Runtime Complexity:
Time complexity of O(b^d) where b is the branching factor and d is the depth of the least cost solution.
space complexity is also O(b^d) as it keeps all frontier nodes in memory.
What exactly does “f(n) ∈ O(n²)” mean?
there exists c > 0, n0 ∈ N such that for all n ≥ n0 : |f(n)| ≤ c|g(n)|
When is a heuristic admissible/consistent?
A heuristic h is admissible if and only if h(n) >= 0 and h(n) <= h’‘(n) for all n, where h’‘(n) is the true minimal cost to the goal from node n.
A heuristic is consistent (or monotone) iff:
h(G) = 0 for all goal states G and
h(n) <= cost(n,s) + h(s) for all n and their successors s (also known as triangle inequality)
Sketch the A* correctness proof
Key Mechanism: Combines actual path cost (g) and a heuristic estimate (h) to form f = g + h for each node.
Heuristic: Must be admissible
Correctness proof:
1. Applied to graphs with bounded neighbourhoods and positive edge cost (≥ ε > 0).
2. Lemma for OPEN Nodes:
* At any point, there is a node n in OPEN with:
* n is on an optimal path to goal (G)
* A* has found an optimal path to n
* f(n) ≤ C, where C is the minimal cost from start (S) to G.
3. Base Case:
* Initially, S is in OPEN with an optimal (empty) path
* (S) ≤ C* since h is admissible
- Inductive Step:
* If the currect node n is not the goal and is expanded, then:
* Its successors include a node M on an optimal path.
* An optimal path to m is found ( otherwise, a shorter path to G exists, contradicting the optimality of the path through n)
* f(m) ≤ C* due to the admissibility of h. - Termination with Success:
* If A* terminates with “success,” it has found the optimal path.
* If G is expanded via an suboptimal path, a contradiction arises since a node n with a smaller f exists in OPEN. - Termination Proof:
* A* generates acyclic paths (no cycles) with positive edge weights.
* Every new or promoted node in OPEN represents a new path.
* The graph is finite under the given conditions, so A* terminates after exploring all acyclic paths up to C*
* the maximal distance to
consider is C which will be reached after at most ⌈C*/ϵ⌉ steps and hence the algorithm stops.
Prove that if h is consistent then f is non-decreasing along any path A* considers ( i.e., f is monotone)
- Consistent Heuristic Defined:
* A heuristic h is consistent if, for every node n and every successor n′ of n, the estimated cost of reaching the goal from n is no greater than the step cost of getting to n′ plus the estimated cost from n′ to the goal. - Formula for f and g:
* f(n) = g(n) + h(n) where:
* g(n) is the actual cost from the start node to node n.
* h(n) is the estimated cost from n to the goal.
* f(n) is the sum of the actual cost to n and heuristic cost from n to the goal. - Assumptions and Implications:
* Assume n′ is a successor of n, which means:
* g(n′) = g(n) + cost(n, n′).
* f(n′) = g(n′) + h(n′) = g(n) + cost(n,n′) + h(n′).
* By the consistency of h, cost(n, n′) + h(n′)
* By the consistency of h, cost(n,n′) + h(n′) >= h(n).
* Therefore, f(n′) >= g(n) + h(n) = f(n) - Result:
* The calculation shows that f(n′), the evaluation function for any successor n′ of n, is at least as great as f(n).
* This proves that f is non-decreasing (monotone) along any path that A* considers.
This shows that A* progresses along a path, the f values of the nodes do not decrease. This property of f being monotone guarantees that A* expands nodes in the order of their optimal path costs, leading to the correct solution.
Prove that h being consistent implies h is admissible ( you can assume that f is monotone)
- Assumption for Contradiction:
* Assume for some state S from which goal G is reachable that h(S) is greater than h*(S).
* Consider an optimal path from S to G. By assumption of monotonicty, f(n) is non decreasing along this path. - Monotonicity of f:
* The evaluation function f(G) at the goal G is g(G) + g(G), where g(G) is the true cost from S to G.
* Since h. is consistent, f is monotone, and this f(G) >= f(S). - Contradiction:
* Starting with f(G), we have f(G) = g(G) + h(G) by definition of f.
* Since h is consistent, we replace h(G) with 0 ( because the heuristic cost to reach the goal from itself is 0), and g(G) with f(S)(by monotonicity).
* This leads to g(G) > h(S), contradicting our assumption that h(S) > h(S).
* Therefore, h(S) <= h*(S) for all S, which means h is admissible.
What algorithm would you choose for computing all shortest distances between a fixed node and all other nodes in a graph? Why? What is the algorithm’s worst case runtime complexity on graph G=(V,E)?
Dijkstra’s algorithm would be ideal for fixed-node shortest path problems.
Complexity: O(V²), but with min-priority queue, it can be reduced to O(V+E log V).
What data structure is well-suited for the A* OPEN variable? Why?
Priority Queue which will efficiently manage the nodes to be explored based on their ‘f’ values.
Describe the following algorithms: HPA*
Hierarchial Pathfinding A **
* Purpose: HPA is an algorithm designed to efficiently find paths in large and complex maps by abstracting the search space into a hierarchy of interconnected regions
* Pre-Processing: Sectors are defined over the map grid to create an abstract graph.
1. Graph nodes represent sector entraces, and edges represent possible paths between these nodes.
2. Intra-sector edges connect nodes within the same sector, while inter-sector edges connect nodes across different sector.
* Path-finding:
1. Paths are found in the abstract graph, which is much smaller than the original map, making the search faster.
2. The abstract path is then translated back into the originial map, often requiring a smoothing step to create a more direct route.
* HPA Process:
1. Divide the Map: Break down the map into smaller, managable sectors.
2. Build Graph: Create a graph where nodes are entrances/exits of sectors of edges represent traversable paths.
3. Compute Distances: Calculate the distances within sector (intra-sector) and between sectors (inter-sector)
4. Find Abstract Path: Use A* to find a path in this abstract graph.
5. Smooth path: Refine the abstract path to align with the actual map, improving path quality.
- Improvements in HPA implementation:
1. Uses Dijkstra’s algorithm (or UCS) for computing intra-sector distances, which ensures that the shortest paths within sectors are found.
2. On-demand intra-sector connections allow for dynamic updates to the graph, benefiting environments where changes occur.
3. Window-based smoothing helps create more natural paths without sharp turns or unnecessary detours. - Performance:
1. Significantly faster than standard A* in large maps due to the reduced number of nodes in the abstract graph.
2. Smoothing is a trade-off: it improves path quality but can slow down the algorithm. - Advantages:
1. Ideal for dynamic worlds where local changes do not require fulll recomputation.
2. Scalable for large maps due to the hierarchical abstraction.
3. Simplicity in implementation and concept. - Disadvantages:
1. Initial setup of connecting nodes to the abstract graph can be time-consuming.
2. Without smoothing, the path may not be the most direct route.
PRA*
- Purpose: PRA* is an algorithm for efficient pathfinding in large maps using hierarchical abstraction and refinement.
- Key Components:
1. Abstraction: Utilizes “clique abstraction” to build a hierarchiacal map representation. Identifies abstract nodes for start and goal positions.
2. Path-finding: - Performs A* search at a high abstraction level.
- Refines path by projecting down to lower levels.
- Employs corridor-based A* searches for refinement.
- PRA* Process:
* Abstraction Hierarchy: Create levels of map abstraction.
* High-Level Search: Use A* at a higher level for initial path.
* Path Refinement: Project path to lower levels, refining at each step.
- Performance:
1. Faster than traditional A* on large maps.
2. Maintains high path quality, nearly optimal.
3. Ideal for applications needing speed and quality, like video games.
*Advantages:
1. Efficient for large-scale maps due to hierarchical approach.
2. Balances speed and path optimality effectively.
- Disadvantages:
1. More complex than basic A* due to multiple levels of abstraction and refinement.
TA*
- Purpose: TA* is a pathfinding algorithm that operates on a mesh of triangles ( a triangulation) instead of a traditional grid. It is designed to optimize pathfinding in polygonal spaces by leveraging the geometric properties of triangulations.
- Key Features:
1. Base Triangulation: TA* runs on a base layer of triangulation, using the triangles as the primary elements of the search space.
2. Search States: Each triangle represents a search state. Adjacent triangles, reachable across unconstrained edges, are considered neighbours.
3. Heuristic: The heuristic (h - value) is the Euclidean distance from the goal to the closest point on the triangle’s entry edge, providing an estimate of the remaining distance to the goal.
4. Distance Estimate: The g-value, or the distance traveled so far, is underestimated for efficient computation. This estimate is improved as the search progresses.
5. Anytime Algorithm: TA* is an anytime algorithm, meaning it can return a valid path even if the search if stopped before completion.
6. Multiple paths: It considers multiple paths to a triangle, refinding the search for the shortest path iteratively.
7. Search continuation: The search continues until either a time limit is rached or no paths can significantly improve the current shortest distance found. - Algorithm Process:
- Initiate Search: Begin with triangles in the base triangulation, identifying the start and goal triangles.
- Expand Search: Expand to adjacent triangles, updating g-values and h-values accordingly.
- Path Selection: Evaluate paths using an A* search framework, selecting paths that minimize the f-value (g + h).
- Path Update: If a shorter path is found, update the current best path.
- Termination: Conclude the search once the best path is found or time constraints are met.
Applications:
TA* is particularly well-suited for environments with complex geometric shapes, where traditional grid-based pathfinding may be inefficient or inaccurate.
TRA*
- Purpose: TRA* is an advanced pathfinding algorithm that operates on a reduced triangulation map to enhance efficiency in finding optimal paths.
- Key Concepts:
1. Triangulation Abstraction: TRA* bullds on TA* algorithm, utilizing an abstraction of the original map into a traingulated graph.
2. Special Cases Handling: Before engaging in a complex search, TRA* checks for special cases where a direct path can be easily determined without a full search.
3. Degree-3 Nodes: These are central nodes within the tringulation that have three connections. TRA* uses these nodes as key waypoints or decision points in the pathfinding process. - Algorithm Process:
- Search State Selection: TRA* starts by moving from the initial and goal positions to their respective adjacent degree-3 nodes.
- Child Generation: Degree-3 nodes generate children, which are also degree-3 nodes, connected via corridors in the triangulation.
- Anytime Algorithm: TRA* functions as an anytime algorithm, meaning it can return a valid path at any interruption point and can handle multiple potential paths to any node.
- Heuristic and Cost Values: TRA* utilizes the same heuristic (h-value) and cost-so-far (g-value) as TA*, often based on Euclidean distances and actual travel costs within the triangulated abstraction.
- Efficiency and Performance:
- Reduced Search Space: By focusing on degree-3 nodes, TRA* effectively reduces the search space, leading to faster computation times.
- Multiple Path Consideration: The ability to consider and refine multiple paths ensures that TRA* can find highly efficient routes without exhaustive search.
- Usage:
1. TRA* is particularly useful in large maps with complex geometries, where traditional grid-based A* might be inefficient due to the high number of nodes and potential paths.
What are triangulations and how to construct them incrementally?
- Start with a Defined Area
* Begin with a specific area, such as a rectangle, identify a set of points within this area, including the corners of the rectangle. - Add Eges:
* Create edges between these points to form triangles. Ensure that the edges you add do not cross each other. - Continue Adding Edges:
* Proceed to add edges until it is no longer possible to add any more without them crossing existing edges. - Resulting Decomposition:
* The final result is a triangle-based decomposition that covers the convex hull of all the given points, meaning the smallest set that contains all the points.
- Triangulation Quality:
- In a set of points, multiple trianguations can exist. However, it is desirable to avoid creating very narrow or ‘silver-like’ triangle because they can increase the quality of the triangulation and make certain calculations, like distance heuristic, less reliable.
Describe what Delaunay triangulations are how to construct them
- Delaunay Triangulations are a specific type of triangulation that maximizes the minimum angle in each triangle. This tends to create more regular and equiangular triangles, avoiding sliver-like shapes.
- These can be created locally by “edge flipping,” which is the process of swapping a shared edge between two triangles to ensure that no point lies within the circumcircle of any triangle except for its vertices.
- Initial Triangulation:
Begin with a large ‘helper’ triangle that encompasses all the points in the set.
- Point Insertion:
Randomly select a point from the set and locate the triangle within the existing triangulation that the point falls in.
3. Subdivision:
Subdivide this triangle into smaller triangles by adding the new point as a vertex.
- Edge Flipping:
Perform edge flipping to maintain the Delaunay property (no point is inside the circumcircle of any triangle).
- Repeat:
Continue this process of adding points and edge flipping until all points are included in the triangulation.
Describe the point localization problem for triangulations and three algorithms for it, also stating their runtime complexity
The point localization problem in triangulations involves finding which triangle a given point falls into within a traingulated space. Here are three algorithms for it:
1. Brute Force Method:
* This approach checks every triangle in the triangulation to determine which one contains the point. This is done by testing whether the point lies inside the bounds of each triangle.
* Runtime Complexity is Θ(n) for n triangles.
* Consideration : This method is straightforward but inefficient for large numbers of triangles, as it requires checking each triangle one by one.
-
Decision Tree Method
* as the traingulation is constructed, a decision tree is maintained. The tree is structured to identify triangle based on comparisions to the x and y coordinates of the query point.
* Runtime Complexity : Expected Θ(logn), where n is the number of triangles. This is because the depth of the tree is logarithmic with respect to the number of triangles.
* Consideration: While more efficient than brute force, maintaining the decsion tree can be complex, especially when the triangulation changes and the tree must be updated accordingly. -
Walking Algorithm
* Start with a random triangle within the triangulation. From the initial triangle, walk towards the goal location (point A) by crossing edges. Continue this process iteratively: At each step, cross the edge that intersects the line from the query point A to the opposite point of the current triangle.
- Runtime complexity: In homogenous triangulations with n triangles, the average case runtime complexity is sqrt(THETA(n)). The worst case would be O(n), which occurs when the algorithm visits every triangle in the worst case scenario.
- Extension: Jump and Walk: samples k triangles and picks the one closest to the goal point A as the starting point for the walk.
- Runtime Complexity: sqrt(THETA(3n)) for sampling plus the expected time for walking.
This algorithm is a practical method for point localization due to its simplicity and relatively good performance in typical cases.
Why is showing the expected runtime of video game AI algorithms insufficient for comparing them. What else do game designers need to know?
Expected runtime of video game AI algorithms is insufficient because it does not account for other crucial aspects that affect the practical deployment of these algorithms in games. Designers need to consider:
1. Simulation TIme:
2. Matrix size: For algorithms that require filling a payoff matrix, the size and complexity of the matrix can significantly affect performance.
3. Decision points: The ability of an AI algorithm to handle subsequent choice points and make decisions based on partial information.
4. Partial Observability: Many games do not provide complete information about the state of the game world. AI algorithms must perform well even with incomplete information.
5. Strategy Coverage: AI algorithms should account for a wide range.
Give a formal definition of 2-player zero-sum perfect information games
2 player game where the amount one player gains at the end of a game the other one loses, i.e., payoffs addup to 0.
MAX tries to maximize his score, MIN tries to minimize MAX’s score. At any time each player knows the state of the game. (meaning all information sets have cardinality of 1)
Describe the following terms: game, strategy, mixed strategy, Nash-equilibrium, information set, imperfect information game
Game is defined by the following components:
1. P: The set of players which act.
2. S: the set of game states.
3. s0 ∈ S: the start state (including player to move)
4. M : S → P: the player-to-move function that maps states to the
unique player which acts next
5. T : S → 2^S: the move transition function which defines for each
state what states are directly reachable. During game play, in state s, player M(s) chooses the successor state from T(s)
6. E ⊆ S: the set of end (or terminal) states.
7. R : E → R^|P|: the reward (or payoff) function which assigns a
reward value to each player in terminal states
8. I: the information set structure, which essentially defines what
players know about the state of the game (details below)
Strategy : Describes how a player acts in any situation in which he is to move.
Mixed strategy: a player chooses moves according to a probabilty distribution.
Nash-equilibrium : A tuple of strategies ( also known as strategy profile) forms a Nash-equilibrium iff no unilateral strategy deviation beneficial to a single player exists. Such equilibria exist for ALL finite n-player game, not just zero-sum 2 player games.
** Information set** : set of possible game states which the player to move cannot distinguish. This is determined by the player’s private knowledge about the game state, and the observable move history.
Imperfect Information : There exists an information set with size > 1.
Game Tree
Game Trees are usually used to represent search space. They are directed acyclic graph. The nodes are decision points, one player (or environment ) to choose move.
The directed edges are move decisions to go from one state to the next.
The Leaves represent terminal positions labeled with a game value for player MAX.
Describe the following algorithms and their purpose: MiniMax, NegaMax, Alpha-Beta
- Purpose: To determine the optimal move in a two-player game by simulating the several moves ahead and choosing the move that maximizes the player’s minimum payoff, under the assumption that the opponent is also playing optimally.
** How MiniMax Works:**
1. Recursive Evaluation:
* The algorithm recursively generates all possible moves from the current state of the game, branching out until it reaches terminal states.
- Value Backpropagation:
* Once the terminal states are reached, their values are evaluated and “backed up” through the game tree. For the maximizing player, denoted as MAX, it chooses the move with the highest value, while the minimizing player, denoted as MIN, chooses the move with the lowest value. - Depth-First Search:
* MiniMax performs a depth-first search of the game tree, which means it explores one branch of the game tree completely before moving on to the next branch. This approach can be memory efficient as it does not need to store all the branches in memory at once. - State Value Determination:
* The value of a state for the player MAX is determined by assuming that both players are playing optimally. Thus, the value is the best score that MAX can gurantee if MIN also plays the best moves. - Height Limitation:
* In practice, it is often impossible to search the entire game tree because games can have a vast number of possible moves. Therefore, the algorithms stops after a certain number of steps, known as the height of the search, and approximates the state value at that point. - Optimal Play assumption:
* The core of the MiniMax algorithm is the assumption that both players will always play their best possible moves. Therefore, the algorithm evaluates the outcomes based on the best response from the opponent.
NegaMax
- Purpose: The NegaMax algorithm is a variant of the MiniMax algorithm used in two-player, zero-sum games. It simplifies MiniMax by exploiting the property that max(a,b) = -min(-a,-b) , this always evaluating states from the perspective of the player to move.
- Algorithm:
1. State Evaluation: - NegaMax evaluates the value of a game state in view of the player currently making a move.
- Base Case:
* If the current state ‘s’ is a terminal state or a maximum search depth (height) is reached, the value of the state is returned from the perspective of the player to move. - Recursive Exploration:
* For each child ( possible move) of the current state, NegaMax is called recursively with the height decreased by one.
* The value returend is negated, reflecting the change in the player’s perspective. - Score accumulation:
* The algorithms maintains a score, initially set to negative infinity. For each child, if the negated value returned is greater than the current score, the score is updated. - Result:
* The final score returned is the value of the stae from the perspective of the player to move.
Alpha-Beta
- Purpose: The Alpha-Beta algorithm is an optimization of the MiniMax algorithm used in two-player, zero-sum games. It aims to reduce the number of nodes that are evaluated in the search tree, making the search process more efficient.
How it Works:
1. Alpha and Beta Bounds: The maintains two values, alpha and beta, representing the lower and upper bounds of the possible final score in a given tree.
Alpha is the best score the maximizing player can guarantee at that point or above.
Beta is the best score the minimizing player can gurantee at the point or below.
Algorithm Process:
1. Start with Bounds: Alpha starts at negative infinity, and beta starts at positive infinity.
2. Recursive Search: The algorithm recursively searches through the game tree, similar to MiniMax.
3. Pruning: If at any point, the alpha value is greater than or equal to the beta value (alpha >= beta), it means that the current node’s value will not affect the final decision higher up in the tree. Hence, the subtree under this node can be pruned.
4. NegaMax Formulation: The algorithm often uses the NegaMax approach, meaning it negates and switches the alpha and beta values when moving up and down the tree.
5. Scoring: The algorithm assigns a score to each state, based on the value of its children, updating the alpha or beta values as it progressses. If the score surpasses the beta value, the branch is pruned.
- Advantages:
- Efficiency: By pruning branches that don’t affect the final decision, Alpha-Beta reduces the number of nodes evaluated, leading to faster decision-making.
- Optimization: It works particularly well with good move ordering, leading to early pruning and less computational overhead
Write psuedocode for MiniMax and Alpha-Beta search
State the Alpha-Beta Return Value Theorem
Let V(s) be the true NegaMax value of state s and h the height of s ( maximal distance to a leaf). Let L = AlphaBeta ( s, h, α, β ) :
Then:
1. L ≤ α ⇒ V(s) ≤ L ≤ α
2. α < L < β ⇒ V(s) = L
3. L ≥ β ⇒ β ≤ L ≤ V(s)
4. AlphaBeta(s, h, −∞, ∞) = V(s)
What is the best case number of leaves Alpha-Beta search visits in homogeneous trees of depth d with branching factor b?
b^(⌈d/2⌉ )+ b^(⌊d/2⌋) − 1
How can cycles be detected in MiniMax search?
Use a stack of states: make move → push, undo move → pop
Before pushing, check whether state already exists on stack
Optimization possible when irreversible moves exist (such as captures
and pawn moves in Chess):
→ Starting from the top of the stack, only search that far
Little storage requirements, but can be slow in deep trees
Improvement: maintain a hash table that stores nodes on the stack
Why can iterative deepening search up to a certain depth in practice be faster than a single search to that depth?
- Move Ordering and Transposition Tables (TTs):
* Iterative deepening can benefit from move ordering. The idea is that moves that were good in previous shallower searches are likely to be good in deeper searches.
* Transposition tables can be used effectively with iterative deepening. These tables store the best moves found in previous iterations. When the same positions are reached in subsequent deeper searches, the algorithm can use the stored values and best moves from the TT, avoiding re-evaluation and thereby speeding up the search. - Better Pruning:
* With iterative deepening, each level of the search tree is fully explored before moving on to the next level. This comprehensive search at each level can produce a good move ordering for the next level.
- When the search goes deeper, moves that led to a quick cut-off previously can be tried first, leading to more efficient pruning in the Alpha-Beta search.
- Anytime Algorithm:
Iterative deepening is an anytime algorithm, which means it can return the best solution found so far if the search is stopped (due to time constraints, for example). This is not the case with a single search to a fixed depth, which does not guarantee a solution until the search is complete.
- Adaptable to Time Constraints:
In a practical scenario, such as a game where computation time for AI is limited, iterative deepening allows the algorithm to use the available time more effectively. It can start with shallow searches and progressively deepen, ensuring that the best possible result is found within the time limit.
When is an LP in normal form?
-
Objective Function: The objective of an LP in normal form is to maximize a linear function. This is represented as:
* Maximize c^t . x
* Here, c is a constant column vector, and x is the unknown column vector we want to optimize. -
Constraints: The LP has a set of linear constraints which the solution must satisfy. These are represented as:
* Subject to A . c <= b
* In this equation, A is a constant matrix, b is a constant column vector, and the inequality <= applies to all components of the resulting vector. - Non-Negativity Constraint: Additionally, the solution vector x must satisfy a non-negativity constraint, meaning all elements of x must be greater than or equal to zero. This is represented as ‘ x >= 0 ‘
An LP is in normal form when it seeks to maximize a linear function represented by c^t . x subject to linear constraints (represented by A . x <= b ) and the non-negativity of the solution vector x.
Define a 2 x 2 matrix game which forces both players to randomize their actions in order to achieve the best possible payoff in the face of optimal opposition. Justify your claims.
This is typically a game with no pure strategy Nash equilibrium. This mean neither player has a dominant strategy, and any choice can be countered by the opponent.
Analysis and Justification:
1. No Dominant Strategy:
* Neither player has a dominant strategy. For player 1, choosing P wins against R but ties with P from player 2. Similarly, choosing R ties against R from Player 2 but loses to P.
* The same logic applies to Player 2. There is no single strategy that always results in a win or a tie.
- Randomization for Optimal Play:
* Since there is no dominant strategy, each player needs to randomize their actions to avoid being predicatable. If a player becomes predictable, the opponent can always choose the counter-strategy for a guranteed win.
* The optimal strategy involves mixing between Paper and Rock with certain probabilities to keep the opponent guessing. -
Mixed Strategy Nash Equilibrium:
* In this game, the mixed strategy Nash equilibrium occurs when each player randomizes their choices in a way that makes the opponent indifferent between their two strategies.
* For instance, if Player 1 chooses R and P with equal probability, Player 2’s expected payoff is zero for both of their strategieis, making them indifferenent. The same applies to Player 1’s choice when Player 2 randomizes equally. - Best Possible Payoff:
* By randomizing, each player maximizes their minimum expected payoff, as any deviation from the random strategy can be exploited by the opponent.
(x^tA)y = (-1,2,-5,3) y
pick y3 = 1 and all others = 0
and worst value for MAX is -5
What are evaluation functions and why do we need them?
Evaluation functions assign values to non-terminal leaf nodes when the search algorithm decides to backtrack early due to the game’s complexity or the limit on lookahead depth.
Why?
1. Complexity of Games: Many games like chess have a vast number of possible states, making it impractical to search the complete game tree up to terminal states.
- Limited Lookahead: Due to computational constraints, algorithms like MiniMax or Alpha-Beta can only look a few moves ahead. Beyond this point, the outcome of the game is uncertain.
- Guiding the search: Evaluation functions help in making informed decisions by providing a heuristic to estimate the potential of a position. They guide the search algorithm ( like MiniMax or Alpha-Beta) towards more promising parts of the game tree.
- Heuristic Estimates: These functions give heuristic estimates of the state’s value.
- Improving AI Decision-Making: In game AI, particularly in chess, evaluation functions are vital for the AI to decide on moves that improve its position or hamper the opponent, even when the direct impact of the move is not immediately evident.
Describe a simple non-trivial evaluation function for Chess
- Material Balance: The most basic part of the evaluation function is the material count. Assign a value to each piece (e.g., Pawn = 1, Knight = 3, Bishop = 3, Rook = 3, Queen = 9) and calculate the material balance by subtracting the sum sum of the opponent’s piece values from the sum of your piece values.
- Board Control: This part of the evaluation considers the number of squares controlled by each piece. More control often translates to a better position. Assign a small value for each square controlled by a piece. For example, each controlled square might add 0.1 to the evaluation.
- Pawn structure: Evaluate pawn structure by considering doubled pawns, isolated pawns, and pawn chains. Deduct points for doubled or isolated pawns and add points for pawns that are part of a chain.
- King Safety: Add a value for the safety of the king, which might involve considering factors like the presence of pawns around the king and whether the king has castled.
- Development and Center Control: Early in the game, give a bonus for piece development (moving pieces from their original squares) and controlling the centre of the board.
- Endgame Consideration: As the game progresses to the endgame, adjust the values to prioritize king activity and pawn promotion potential.
Sample Evaluation Function Formula:
Evaluation = (Material Balance) + (0.1 * Total Controlled Squares) - (Pawn Structure Penalties) + (King Safety Bonus) + (Development and Center Control Bonuses)
Describe ways of optimizing evaluation function parameters
-
Statistical Regression:
* This method involves using statistical techniquest to adjust the weights of the evaluation function so that the output of the function closely matches the real value of different positions.
* Regression can be used to fine-tune the parameters based on historical data, such as past games, where the outcomes are known. -
Reinforcement Learning:
* In reinforcement learning, the evaluation function is adjusted based on received from the environment. This method is particularly useful in scenarious where the AI can play against itself or other opponentm gradually learning and improving its evaluation function.
* The parameters of the evaluation function are adjusted to maximize the AI’s performance in terms of winning or achieving specific goals in the game. -
Pattern Tables
* Evaluation functions built on pattern tables ( such as all horizontal, vertical, diagonal configurations, or specific regional patterns like 3x3 corners) can capture feature correlations that might not be apparent with manual tuning.
* These pattern tables often contain hundres of thousands of parameters, and optimizing these parameters can significantly improve the accuracy of the evaluation function. -
Selective Search and Opening-Book Learning:
* Combining an optimized evaluation function with deep selective search and opening-book learning can lead to significant improvements in performance. This approach was successfully used by the Othello program Logistello.
- Accuracy: Optimized parameters lead to more accurate evaluations of game states, which is crucial for the AI to make better decisions.
- Parameter optimization speed up the evaluation process.
- Using machine learning techniques like reinforcement learning allows the evaluation function to adapt based on the AI’s experiences, leading to continuous improvement over time.
What is linear regression, and how can optimal weights be found?
Linear Regression:
1. Model:
* In its simplest form (simple linear regression), the model predicts a dependent variable Y based on a single independed variable X. The linear function is usually of the form Y = aX + b, where a is the slope and b is the y-intercept.
* in multiple linear regression, the model involveds several independent variables: Y = a1X1 + a2X2 + … + anXn + b
- Purpose:
* The aim is to find the values of the coefficient (a and b in the simple case) that minimize the difference between the predicted values and the actual data points.
Finding Optimal Weights:
1. Least Squares Mathod:
* This method minimizes the sum of the squres of the residuals ( the differences between the observed values and the values predicted by the model).
* Mathematically, it involves solving min ∑(Yi − (aXi + b))^2, where Yi are the observed values, and (aXi + b) are the predicted values.
- Matrix Formulation ( For multiple Regression):
* In multiple regression, a matrix approach is used. The normal equation X^T Xw = X^T Y is solved. - Gradient Descent:
* This can be used when the number of features is large. It iteratively adjusts the coefficients to minimize the cost function ( sum of squared residuals)
Find the linear function that fits the following ( x, y) dataset best w.r.t square loss: { (-1,0), (0,0), (1,2) }. Explain your steps.
Model: We use a linear model y = ax + b, where a and b are the coefficients we need to determine.
Our goal is to find a and b such that sum of squared errors is minimized.
Sum of Squared Erros:
* The sum of squared errors for this model is given by the function e(a,b) = ∑ (a.xi + b - yi )^2, where xi and yi are the coordinates of the given data points.
What is logistic regression, and what shortcoming of linear regression does it address?
Logistical regression is a statistical method used for binary classification problem. It is a form of regression analysis that is used when the dependent variable is categorical. Logistic regression estimates the probability that a given input point belong to a certain category.
Linear Regression Shortcomings:
1. Outcome variable restriction: In cases where the outcome is categorical, linear regression is not suitable. Logistic regression addresses this by modeling the probabilty of class membership.
2. Logistic regression, ensures that the output is always a valid probability between 0 and 1.
3. Non-Linear Decision Boundary: Logistic regression can model scenarios where the relationship between the independent variables and the log odds of the dependent variable is non-linear.
Describe the components of artificial neural networks
- Neurons: These are the basic units of ANN, Each neuron receives input, processes it, and produces an output.
-
Layers:
* ANNs are structured into layers: - Input layer: This is the first layer that receives the input data.
- Hidden Layers: These layers transform the input into something the output layer can use.
- Output Layer: This layer produces the final output of the network. The design of this layer depends on the specific task.
- Weights and Biases: Weights determine the strength of the influence of an input on the neuron’s output, while biases allow shifting the activation function.
- Activation Functions: These functions define the output of a neuron given a set of inputs. They introduce non-linearity into the network, allowing it to model complex relationships.
- Connections: Neurons are connected to each other, with each connection carrying the output of one neuron(modified by a weight) to another neuron.
- Learning Algorithm: This usually involves adjusting the weightrs and biases based on the error of the network’s out compared to the expected result.
Describe an image classification task
Image classification is a task in AI where the goal is to categorize images into one of several predefined classes or labels.
Key steps:
1. Data collection: Dataset of images where each image is labeled with its corresponding class.
2. Preprocessing: Process images to make them suitable for analysis, which may include resizing, normalizing, and augmenting the data.
3. Feature Extraction: Extract relevant features from the images.
4. Model Training: Use the processed data to train a machine learning model
5. Evaluation: Validate the model’s performance using different metrics.
How can neural network weights be optimized?
- Backpropagation: It involes two-step process:
- Forward Pass: The input data is passed through the network layer by layer, to calculate the output.
- Backward Pass: The error is calculated and propagated back through the network.
- Gradient Descent: Approach in which the weights are adjusted in the direction that most reduces the error. The gradients calculated during back
- Learning Rate: The learning rate is a crucial hyperparameter that determines the size of the steps taken during the weight update.
- Regularization: Regularization techniques, such as L1 and L2 regularization, can be used to prevent overfitting. They work by adding a penalty term to the loss function that constraints the magnitude of the weights.
Give an example that illustrates that deep neural networks may use much fewer weights than shallow networks to accomplish the same task.
Image Recognition Task
* Shallow Neural Network: a shallow neural network, with only one or two hidden layers, might require a very large number of neurons in each layer to capture complex features directly from raw pixels. As a result, the number of weights can become extremely large.
* Deep Neural Network, with many hidden layers can build a hierarchy of features. Because of its hierarchical feature extraction, each layer does not need to be as large as the layers in a shallow network. This structure allows them to capture complex representation with fewer overall weights compared to a shallow network.
Describe the concepts of supervised and reinforcement learning
Supervised Learning:
1. produce/observe m training positions si with game result label vi.
2. Fit weights wi of parameterized evaluation function Vw(s) so that sum of squared errors is minimized.
I.e., find vector w such that the error function is minimized w.r.t parameter vector w.
Reinforcement Learning:
1. Have the program automatically interact with the environment.
2. After playing a few episodes, modify model weights using gradient descent.
Name advantages and disadvantages of sampling-based search methods
Advantages:
1. Conceptually a simple search algorithm
2. Can realize complex behaviours with no explicit knowledge so no dependence on expert knowledge.
3. Prefers robust positions with many winning continuations.
Disadvantages:
1. Problems in tactical situations
* Narrow lines of play are hard to find by randomized search.
* May not converge to a winning move at all - if one exists.
* Or may not converge to a clear ‘Winner’
- Does not approximate mixed strategies out of the box.
- To get useful results may need to include opponent modeling.
* Observe opponents to determine their likely move choices.
* Opponent modeling is a hard problem.
Describe the MCTS framework
-
Phases of MCTS Iteration:
*MCTS operates in four distinct phases during each iteration: - Selection: Starting from the root node (R), the algorithm selects successive child nodes down to a leaf node (L) or a leaf’s predecessor. The selection is made based on which nodes allow the game tree to expand the most promising regions.
- Expansion: Unless the leaf node (L) results in a win or loss, one or more child nodes are created. The algorithm then chooses one of these new nodes (node C) to continue the search.
- Sampling(Rollout): From node C, the game is played out semi-randomly to the end. This process is known as a “rollout” and is used to simulated a possible future path in the game.
- Backpropagation: The result of the rollout is used to update the information in the nodes on the path from node C back up to the root R. This usually involves updating statistics like win/loss ratios for these nodes.
- End of Search: The search ends when the allocated search time expires. At this point, the best move is reported, typically determined by either the best average value or the most visited root move for more stable evaluations.
- Application: Mcts has been successfully applied to various games, especially those with large state spaces and complex dynamics.
- Dynamic Tree Construction: A key feature of MCTS is its dynamic construction of the game tree, only expanding it in the most promising directions based on the results of numerous simulations ( rollouts ).
- Flexibility and Efficiency: The MCTS framework is flexible and efficient, as it has the ability to adapt, explore and exploit different game states.
Describe the following algorithms and their purpose: simple sampling-based search, UCB, UCT
Purpose: algorithm is designed to solve the multi-armed bandit problem, which involves deciding which arm of a bandit (or slot machine) to pull to maximize rewards over time.
Key Concepts:
* Multi-Armed Bandit Problem: Involves multiple slot machines (arms) with different, unknown reward probabilities. The goal is to maximize the total reward over a series of pulls.
* Regret: Measures the loss in potential reward due to not always choosing the best arm. The aim is to minimize this regret.
Algorithm Process:
1. Initialization : Initially, pull each arm once to get an initial measure of their rewards.
2. Exploration-Exploration Trade-off:
* The algorithm balances between exploiting arms that have given high rewards and exploring lesser-used arms to discover their potential.
* This decision is based on the calculated Upper Confidence Bound for each arm.
- Upper Confidence Bound Calculation:
* For each arm, calculate Xi + Csqrt…, where:
* Xi is the average reward obtained from arm i so far.
* Ti is the number of times arm i has been pulled.
* T is the total number of trials.
* C is a positive constant determining the level of exploration; higher values encourage more exploration. - Arm Selection:
* In each round, select the arm with the highest upper confidence bound.
Significance:
* The UCB algorithm effectively addresses the exploration-exploitation dilemma.
* It is used in scenarios where there is a need to balance between the known rewards and the potential of unexplored options.
Advantages:
1. Reduced Target: UCB ensures that the regret grows logarithmically with the number of trials.
2. Robustness: UCB is effective even with little prior knowledge about the reward distributions of the arms.
Simple sampling-based search
Purpose: Search technique in AI that uses random sampling to explore possible outcomes and scenarios in a problem space.
Key Featurs:
* Non-Uniform Sampling: Unlike uniform random sequences (as in Monte Carlo sampling for numerical integration), this algorithm biases the sampling towards more likely scenarios based on available information
* Scenario-Based Approach: It generates samples that are consistent with known information or historical data. For example, in trick-based card games, it generates card distributions that align with move history and probabilities.
Advantages:
* Simplicity: Conceptually straightforward, it doesn’t require complex mechanisms or detailed knowledge of the problem space.
* Complex Behaviors: Capable of realizing complex behaviors without explicit knowledge or dependency on expert understanding.
* Robust Positioning: Prefers robust positions offering many winning continuations, enhancing its effectiveness in uncertain or dynamic environments.
Disadvantages:
Randomness: As a random sampling method, it may not always be efficient or optimal, particularly in scenarios with a vast number of possible outcomes.
Applications: Suited for situations where the outcomes are uncertain or the problem space is too large for exhaustive search methods.
State the central limit theorem and how it can explain the C/sqrt(Tᵢ) factor in the UCB rule
Statement: The theorem states that, given a sufficiently large sample size, the distribution of the sample mean will approximate a normal distribution, regardless of the distribution of the population, provided that the variance of the population is finite.
Implication: This means that if you repeatedly sample from any distribution, the means of these samples form a normal distribution.
Explain why UCB can be called “optimistic”
UCB is considered optimistic because it selects the arm with the highest potential reward based on current knowledge. The algorithm explores less frequently chosen options, considering the possibility that they might yield better rewards than what has been observed so far.
What is the purpose of the log(T) factor in UCB rule?
This is used in UCB rule to ensure that UCB never stops pulling any particular arm, if it stops, C*sqrt((logT)/Ti) will eventually exceed the values of the pulled arms, which is a contradiction.
Prove that the variance of any random variable with range [0,1] is <= 1/4
Describe UCT’s RAVE and Prior Knowledge heuristics
Describe how three great AI ideas led to Alpha-Go’s success
Big-O Notation
When comparing runtime functions we are only interested in their growth rate which is the asymptotic behaviours for large input sizes n.
Describes set of functions f(n) which, as n gets large, grow no faster than a constant times g(n)
it gives us the asymptotic upper bound.
Big-Omega notation
It is used to describe a set of functions f(n) that grow ** at least as fast as ** a constant time g(n).
It gives us the asymptotic lower bound
Big-Theta notation
Describes a set of functions f(n) that grow with the same rate as g(n).
it gives us asymptotic growth rate