COMP6032 - AI Flashcards
What are the 5 types of Agent and what is the difference between them?
Simple Reflex Agents
- Acts solely what it can see.
- Cannot solve problems requiring memory.
Agents with State
- Has basic memory, so can act on previous percepts as well as current.
Goal-based Agents
- Choose an action based on the goal.
- Doesn’t just react, but also plans.
Utility-based Agents
- Extended Goal-based which looks for an optimal solution.
Learning Agents
- Behaviour changes based on feedback from the environment.
What are some examples of Informed and Uninformed Search?
Informed
- Dijkstra’s Algorithm
- A*
Uninformed
- BFS
- DFS
- Iterative Deepening
What is the difference between Weak AI and Strong AI?
Weak AI is designed to a Goal or a Behaviour whereas Strong AI actually has a mind and complete human-like autonomy.
Strong AI does not exist in the real world.
What does PEAS stand for?
Performance Measure - criteria for success.
Environment
Actuators
Sensors
What is the Minimax algorithm?
Minimax is a Game Theory algorithms with 2 agents.
The maximiser will make decisions to maximise their score.
The minimiser will make decisions to minimise the maximiser’s score.
Both players know that the other will play their move by the opposite extreme, and will make their own decisions accordingly
It can be computed recursively.
What is Alpha-Beta Pruning?
Alpha-Beta Pruning is an optimisation over the Minimax algorithm where branches that have not been fully evaluated may be dismissed based on a partial evaluation.
When we compute the decision for a particular turn, rather than computing all of the results and selecting the minimum/maximum, we update a temporary state after every option is computed; <= or >= x.
Consider a 1-turn-per-player game with 2 choices per turn where the maximiser starts.
The 4 possible options that the second player (minimiser) can choose from on the final turn are 3, 4, 2 and 5.
The maximiser will evaluate their first choice as min(3,4) = 3.
When computing the second turn, they will see the 2 and know that the value of that choice is <= 2.
If x <= 2, we know that 3 > x; therefore the 5 (or any number of subsequent branches) does not need to be considered.
What is Propositional Logic?
The simplest type of Logic.
There are two tokens:
= Sentences / statements
= Connectives / operators
== Conjunction (AND ‘∧’)
== Disjunction (OR ‘∨’)
== Implication (‘⇒’)
=== If A is true, B must also be true for the output to be true.
=== If A is false, then the output is just true.
== Biconditional (‘⇔’)
=== Output is true if A and B are equal.
Are the following equivalent?
(P=>Q)∨(¬Q)
¬P∨Q
(Propositional Logic)
No.
The first is a Tautology (always True).
The second if False when P = True, Q = False.
What is Modus Ponens?
With propositions:
- A => B
- A is true
We can conclude B is true.
What is Modus Tollens?
With propositions:
- A => B
- B is false
We can conclude that A is false.
Determine whether the following argument is valid using logical reasoning:
= Premises:
== If it rains, then the ground is wet.
== The ground is not wet.
= Conclusion:
== It did not rain.
(Propositional Logic)
Premises:
- R => W
- ¬W is true
Modus Tollens:
A => B and ¬B is true implies A is false
Therefore:
- R is false.
- The stated conclusion is correct.
Determine whether the following argument is valid using logical reasoning:
= Premises:
== If it rains the ground is wet.
== If the ground is wet the match is cancelled.
== It is raining.
= Conclusion:
== The match is not cancelled.
(Propositional Logic)
Premises:
- R => W
- W => C
- R is true
Modus Ponens:
A => B and A is true implies B is true.
Therefore:
- W is true.
- C is true.
- The stated conclusion is incorrect.
What is Validity and Satisfiability?
The expression is valid if the output is always true across different variable values.
e.g. A ∨ ¬A
The output is satisfiable if there is at least one permutation of input values that causes the output to be true.
e.g. A ∨ B
What is the Resolution Rule?
Resolution is a Propositional Logic rule that allows us to infer a new clause from two clauses with complementary literals.
Given:
- A ∨ B
- ¬A ∨ C
We can infer
- B ∨ C (‘resolving on A’)
What is Forward Chaining?
- Given an initial set of facts (Knowledge Base), a set of inference rules, and a conclusion to be verified.
- Apply rules to Knowledge Base to infer new facts.
- Add new facts to the Knowledge Base.
- Repeat until you are able to verify or disprove the conclusion. (May not be possible).
What is Backward Chaining?
- Given an initial set of facts (Knowledge Base), a set of inference rules, and a conclusion to be verified.
- Starting with the conclusion, apply inference rules to create assumptions.
- Repeatedly work backwards with the assumptions until you can disprove or validate the conclusion. (May not be possible).
What a Horn Clause?
What is its significance?
A logic clause consisting of only disjunction (OR) and containing zero or one positive literals.
The significance is that – using first-order resolution – the resolution of two horn clauses is another horn clause. This means that any number of horn clauses can be resolved down to one, increasing the efficiency in proving a theorem.
What is CNF?
What is its significance?
Conjunctive Normal Form.
A conjunction of one or more clauses, where a clause is a disjunction of literals.
They are required for Boolean Satisfiability solvers (SAT problems) as well as other applications in Resolution and beyond.
Are these Expressions in CNF? If not, why not?
- ¬(A∧B)
- (A∨¬B∨¬C)∧(¬D∨E∨F∨D∨F)
- ¬(A∨B)∧C
- A∧(B∨(D∧E))
- No, because it contains a non-literal inside of a NOT.
- Yes.
- No, because it contains a non-literal inside of a NOT.
- No, because it contains a conjunction within a disjunction.
What are the 3 main categories of Logic and their complexities?
Propositional Logic - simplest
[comprised only of statements and the operators: conjunction, disjunction, implication, and biconditional]
Predicate Logic aka Predicate Calculus aka First-order Logic
[Introduces variables, predicates, and quantifiers]
Higher-order Logic - most complex
[Allows quantification over predicates and functions]
What is Unification in First-order logic?
A type of Automated Reasoning that looks for a variable substitution configuration to equate two expressions.
e.g. (variables are lower case)
P(A, x)
P(y, B)
Unifying substitution:
{ y↦A, x↦B }
What are Quantifies in First-order logic?
Operators that describe the number of individuals that the following statement describes
∀ - Universal - Applies to everything
∃ - Existential - Applies to at least one
What is the goal of a Constraint Satisfaction Problem?
To assign each Variable a Value such that no restrictions are violated.
A constraint might be that the sum of two Variables must not exceed 10; thus the variables may not both have a value of 6.
What is AC-3?
(Constraint Satisfaction Problem (CSP))
Arc Consistency Algorithm 3.
Reduces the search space by eliminating values from variables that cannot result in a solution.
Looks at pairs of variables.
Runs in O(e * d^2) time
e = constraints
d = max domain size
Does not always completely solve a CSP, but is often used as a preprocessing step.
What is Minimum Remaining Values (MRV)?
(Constraint Satisfaction Problem (CSP))
Heuristic to find CSP solution quicker.
When choosing which Variable to allocate a Value to next, select the one with fewest remaining Values in its domain.
By doing this, you reduce the probability of reaching a Variable that has no valid Variables left in its domain.
What is Least Constraining Value (LCV)?
(Constraint Satisfaction Problem (CSP))
Heuristic to find CSP solution quicker.
When allocating a value to a variable, select the Value that removes the fewest Values from the domains of the other unallocated Variables.
By doing this, you reduce the probability of not having any valid Values left once you reach later Variables.
What is Partial-Order Planning (POP)?
As opposed to Total-Order Planning, it is a non-linear planning algorithm.
This means that there may exist a choice between two or more tasks that are not dependent on each other.
What is a Critical Path?
The Critical Path describes the sequence of tasks (usually in a branching plan where certain tasks may be completed in parallel) where delays cause the overall completion time to increase.
Example:
You have 3 tasks: A, B, and C, of durations 1hr, 2hrs, and 3hrs respectively.
C depends on A and B, but neither A nor B have any dependencies.
Tasks A and B may be started immediately, though C will have to wait until they are both finished.
When A finishes, C will still have to wait for 1 more hour before it can begin.
Thus, if A were to start 30 minutes or 1 hour late, the completion time would not change.
What is Minimum Slack Scheduling?
Where there is a selection of tasks that may not be completed in parallel, complete the one that will impact the Critical Path more first.
What is the difference between Bayesian Reasoning and Frequentist Reasoning?
The Frequentist view states that probability is about the long-term outcome, based off of observation over a series of independent runs. What’s the probability of a coin flip landing on heads? Flip it 1000 times, count the number of heads, and divide it by 1000.
It does not act over individual events.
The Bayesian view is a belief that can be applied to an individual event, that is updated with new evidence. What’s the probability of a coin flip landing on heads? Start with a prior assumption like 50:50; as you see a cluster of many heads in a row, your probability may increase and vice versa.
What are Kolmogorov’s Axioms?
- Non-Negativity
All probabilities are >= 0. - Normalisation
The probability of the whole sample space is 1. - Additivity
The probability of the union of mutually exclusive events is equal to the sum of their individual probabilities.
(A1∪A2∪A3∪…)=P(A1)+P(A2)+P(A3)+…
What is Mean in probability?
The expected value of a random variable.
(average)
What is Variance in probability?
A measure of how much a random variable deviates from the Mean.
What is the relation between Variance and Standard Deviation?
What is the benefit of each?
Standard Deviation is the square root of Variance.
Variance is a better fit in theoretical distributions and statistical models
Standard Deviation is easier to interpret because it uses the same units as the data.
What is Entropy in probability?
A measure of uncertainty.
Higher entropy = higher uncertainty.
What is the difference between a discrete distribution and a continuous distribution?
Discrete distributions sum to 1.
(e.g. Bernoulli, Binomial, Poisson)
Continuous distributions have a density that integrates to 1.
(e.g. Normal / Gaussian)
What is KL Divergence?
Kullback-Leibler Divergence
Measures how different probability distributions differ from each other.
What is Bayes Theorem?
A Formula or Philosophy used to determine the probability of an event based on some known prior information.
What is Bayes Theorem Formula?
H - Hypothesis
E - Evidence
P(H | E) = ( P(E | H) * P(H) ) / ( P(E) )
What is Bayes Theorem Formula in terms of the terminology?
Posterior Probability = Likelihood * Prior / Marginal
Posterior = P(H | E)
Likelihood = P(E | H)
Prior = P(H)
Marginal = P(E)
What is the difference between Probability and Likelihood?
Probability is to do with using assumptions to estimate the unknown.
Did I leave my keys in the car?
Will rain today?
(Key focus in Bayesian approaches)
Likelihood where we have known data and we’re evaluating our assumptions.
How many times has it rained previously?
(Key focus in Frequentist approaches)
Probabilities may be supplemented with evidence and likelihoods may be supplemented with bounds:
Given that it is cloudy, will it rain today? (Probability)
How many times has it rained previously when the sky was cloudy? (Likelihood)
What is the difference between Joint probability, Marginal Probability, and Conditional Probability?
Joint - Intersection - P(A ∧ B)
Marginal - Individual - P(A), P(B)
Conditional - Influence - P(A | B)
What is a CPT?
Conditional Probability Table.
Specifies the probability of a particular event given different pieces of evidence).
Rows specify a particular condition (e.g. cloudy = true), and state the probabilities of the outcome (e.g. P(rain) = 0.8).
Rows must sum to 1.
What are Markov Blankets?
The Markov Blanket of a node in a Markov Chain is the set of nodes that make the base node conditionally independent of the rest of the network.
The state of a node is known if the states of all nodes in its Markov Blanket are also known.
Consists of all direct parents, all direct children, and all co-parents (all other parents of all children).
Why are Markov Blankets useful?
They tell us the minimal set of nodes required to update a variable.
Used at the core of many algorithms (e.g. Gibbs Algorithm).
What are Transition Models?
Describe how a system changes over time.
States the probability of transitioning to each state from each other state.
Core concept in Markov Processes, Hidden Markov Models, Dynamic Bayesian Networks, and Reinforcement Learning.
What are Observation Models?
Relevant when the actual state is unknown, but we’re trying to determine what it is from the behaviour we can see.
Gives us P(observation | state) for each state.
What is a Stationary Distribution?
Probability Distribution over States of a Markov Chain that does not change over time.
What is a Markov Chain?
A sequence of states where the probability of moving to the next state is dependent on the current state, but not how you got there.
Like a Finite State Automata with Probabilistic Transitions.
What is Rational Preference?
In Utility Theory, Rational Preference just means that when you have a choice between some different options, your choices make sense, are consistent, and follow a pattern.
Must satisfy:
- Completeness (every two options must be comparable with a set preference).
- Transitivity (if A > B and B > C then A > C).
- Continuity (no sudden switches in preference when a new event occurs).
- Independence (when indifferent between A and B, you should be indifferent to lotteries that have A or B as an outcome with the same probability).
What is Value of Information?
In Utility Theory, Value of Information is how much you would be willing to spend to gain information before making a choice.
For example, if you are about to spend £10,000 on a new car, you might be willing to pay £200 to have a Mechanic check it over first.
What is Dominance?
In Utility Theory, Dominance is where there is an option that is better under all scenarios.
Weak Dominance is where one option is at least as good in all situations and better in at least one. Strong Dominance is where one is better than the other in all scenarios.
What is Stochastic Dominance?
In Utility Theory, Stochastic Dominance is Dominance that involves Uncertainty.
Instead of outcomes, we are choosing between lotteries.
Lottery A is Stochastically Dominant over Lottery B if A has a better probability of reaching every reward threshold.
e.g.
A - 30% chance of winning £5
B - 20% chance of winning £5 (value <= A)
> A is Stochastically Dominant.
A - 30% chance of winning £5
B - 20% chance of winning £6 (value > A)
> A is not Stochastically Dominant because A has a 0% chance of reaching >£5.
What is Information Gain in Decision Trees?
It tells us how much we reduces uncertainty by splitting on a particular feature.
When asking “will it rain today?”, is it more relevant to ask “did it rain yesterday?” or “is it cloudy now?”
What is Precision in Decision Matrices?
Take all of the outputs that the model predicted as Positive. How many of them are correct?
TP / (TP + FP)
What is Recall in Decision Matrices?
Take all of the results that were actually correct. How many of them did the model find?
TP / (TP + FN)
What is F1 Score in Decision Matrices?
The balance of Precision and Recall.
2* (Pr * Re) / (Pr + Re)
What are Monte Carlo methods?
Where random sampling is used to help solve very complicated problems.
e.g. Estimation of Pi:
Inscribe a circle inside of a square.
Throw darts randomly at the square.
The ratio of darts inside the circle relative to inside the square will tend to pi/4.
What is Stochastic Sampling?
Random sampling according to probabilities.
e.g.
- Monte Carlo Methods
- Bayesian Inference
- Reinforcement Learning
- Genetic Algorithms
What is a Particle Filter?
Represent model states using vector locations.
Start by randomly placing ‘particles’ (guesses) into the sample space.
Reveal new observations (measurements) that you want the model to converge on.
Check the proximity of each particle to the observation.
Promote the weights of particles that guessed the best.
What is Likelihood Weighting?
Technique in Probabilistic Inference.
Use the likelihood of each outcome from previous results to influence the future predictions.
What are Embeddings?
In Natural Language Processing, Embeddings are a way of representing words, phrases, or whole sentences as numeric vectors.
What is Bag-of-Words?
The simplest form of NLP.
Grammar and sentence structure are omitted, you just have words and their frequencies.
+ Very simple and fast.
- Ignores word order and context.
- Cannot handle synonyms.
What is TF-IDF?
Term Frequency - Inverse Document Frequency
A method of selecting the most important words using inverse frequency.
Words like ‘the’ or ‘is’ come up very often, so they have a reduced importance.
TF-IDF = TF * IDF
TF = (number of times word appears in document) / (total words in document)
IDF = log( (total number of documents) / (number of documents containing the word) )
What are Transformers?
In NLP, Transformers are Deep Learning models that process entire sentences in parallel and are able to capture context through self-attention.
What are the main components of a Transformer?
- Input Embeddings
- Positional Encoding
- Self-Attention
- Multi-Head Attention
- Feed-Forward Networks
- Layer Normalisation and Residuals
- Encoder and Decoder Blocks
What is Deontology in Ethics?
A Kant idea.
Looks at actions as right or wrong.
Uses the Maxem of Universality; would it be okay if everyone did it?
What is Utilitarianism in Ethics?
An idea from Jeremy Bentham.
Looks at consequences, and prioritises the greatest good for the greatest number of people.
What is Consequentialism in Ethics?
A more general form of Utilitarianism that seeks a good consequence rather than the best outcome for the most people.
What is Virtue in Ethics?
It’s not about what you do but about who you are.
What is Zipf’s Law?
When a list of measured values is stored in descending order, the value of the nth entry is inversely proportional to n.
In NLP, this means that words that appear more frequently in text are less valuable to the meaning.
What are the 5 biggest challenges in NLP?
- Ambiguity (a word or phrase can have multiple meanings).
- Anaphora (referring back to something already mentioned, e.g. pronouns).
- Indexicality (where direct meaning depends on context, e.g. the meaning of “now” depends on when is was said or “I” depends on by whom).
- Metonymy (context-dependent synonym, e.g. “government has said”).
- Non-compositionality (a phrase that cannot be derived when considering each individual word, e.g. idioms).
How does the A* algorithm work at a low level?
3 Variables
- G: Cost of path so far (previous node + arc distance)
- H: Distance from target (heuristic; optimistic; <=traversable distance)
- F: G + H
Calculate the F value for each option and always select the smallest one until you find the target.
If you have multiple options with the same F value, select the one with the smaller H value.