Artificial Intelligence and natural language Processing(CS404) Flashcards
What is subjectverb agreement and how is it enforced on a parse tree? How
does subjectverb agreement contribute to the process of parsing the
following sentences?
(i) I am a man
(ii) I are a man *
Agreement in Number + Person between the
subject and verb of a sentence.
The parse tree identifies the subject and verb of a sentence as separate nodes and can identify if features like number (singular or plural) and person (first, second, or third) are consistent
(i) {1S}∩{1S} = {1s}
(ii) {1S}∩{3P} = Ø
What is a synset – such as those found in WordNet? Describe how WordNet
makes use of any two relations to structure its knowledgebase. Make use
of examples to illustrate your answer.
A synset is a distinct meaning for each word.
hyponymy relation refers to “is a kind of”
a polar bear is a kind of bear.
hypernym(general term) ie bear
hyponym(specific term) ie polar bear
instance hyponymy relation refers to “is an instance of”
“Einstein is an instance of a physicist”
this is different from regular hyponymy relation because it is not a subcategory it is an example.
Meronymy relation refers to “is a member of”
the (meronym) is a member of (the holonym).
a tree is a member of a forest.
Describe any two similarity metrics used to compare WordNet synsets.
Path which is based on the number of connections between the two synsets.It is calculated as 1 over the path length + 1.
Wu & Palmer (WuP) Method which is based on the least common subsummer(LCS) which is the clostest common ancestor of the two synsets. It is calculated by 2* LCS depth over synset1 depth plus synset2 depth.
lin method is Information content is based on the probability of encountering an instance of a synset in a specific corpus. It is calculated by 2* LCS information content over synset1 information content plus synset2 information content.
Write down Google’s PageRank formula. What do each of the terms in this
formula signify?
The PageRank of a page is the chance that
a random surfer will land on that page.
PR(A): PageRank of page A. This is what the algorithm is trying to calculate - the ‘importance’ or ‘authority’ of page A on the web.
(1-d): This is a damping factor which is a small constant (usually set around 0.85). It represents the probability of a random user choosing to follow a random link on a web page. The idea is that a user does not always follow outbound links from a given webpage and sometimes jumps to a random page.
d: This is the complementary probability to the damping factor (1-d). It represents the probability of a user following an outbound link from a page. It’s a way to model the behavior of a user sticking to a chain of links.
PR(Ti): PageRank of pages Ti which link to page A. These are the pages that have outbound links pointing to page A. Their PageRank contributes to the PageRank of page A.
C(Ti): The number of outbound links on page Ti. This is used to divide the PageRank of Ti. The idea is that the ‘vote’ of a page is split between all the links it contains; the more links, the less ‘voting power’ each link has.
The sum: The formula sums the contributions of all pages linking to page A. Each contribution is the PageRank of the linking page divided by the number of links on that page.
What is the relevance of a random walk to Google’s PageRank function?
Make use of an appropriate diagram to illustrate your answer.
The relevance of a random walk to PageRank is that it represents the likelihood that a person randomly clicking on links will arrive at a particular page. The pages that are more ‘reachable’ in this random walk (i.e., more frequently encountered) are deemed to be more important, and thus receive a higher PageRank score.
(add diagram)
Consider the following sentence
“John is a pig.” Compare a literal interpretation of this sentence to a metaphoric
interpretation. What are the main differences between these two
interpretations?
A literal interpretation of “John is a pig” means John is actually the animal, a pig. A metaphoric interpretation suggests John has characteristics associated with pigs, like messiness or greed. The main difference is that the literal interpretation takes the words at face value, while the metaphoric interpretation uses “pig” symbolically to describe John’s traits.
What is meant by the “features” of a word, and what is the relation
between features and “agreement”? Briefly describe three different
categories of agreement and how they may be enforced on a parse tree.
Illustrate your answer using appropriate examples in English, Irish,
French or other natural language.
“features” of a word refer to its attributes like part of speech, tense, number, gender, and semantic properties that define its role and meaning in language.
The features and “agreement” involves ensuring that words in a sentence match in certain features, such as number, gender, or case.
Agreement in Number + Person between the
subject and verb of a sentence.
The parse tree identifies the subject and verb of a sentence as separate nodes and can identify if features like number (singular or plural) and person (first, second, or third) are consistent
I am tired
{1S}∩{1S} = {1s}
He is tired
{3S}∩{3s} = {3s}
*He am tired
{3S}∩{1S} = {} or Ø
*I is tired
{1S}∩{3S} = Ø
Quantifier Noun agreement
Agreement in number between Noun and quantifier
– This man
*This men
– These men
*These man
Determiner Noun agreement (in French etc.)
Gender agreement between Determiner and Noun
Le stylo
*La stylo
– La femme
*Le femme
Cardinal-number Noun agreement (in Irish)
Agreement in number the ordinal term and Noun
Beirt {daoine}
* dha daoine
– Dha sciain
*beirt {sciain}
Discuss the use of WordNet to represent semantic information. Describe
any two relationships used by WordNet to organize its taxonomy.
WordNet is a widely-used lexical database that represents semantic information of English words. It organizes words into sets of synonyms (synsets), each representing a unique concept. These synsets are linked by various types of semantic relationships, creating a rich network that reflects the structure of human language.
hyponymy relation refers to “is a kind of”
a polar bear is a kind of bear.
hypernym(general term) ie bear
hyponym(specific term) ie polar bear
instance hyponymy relation refers to “is an instance of”
“Einstein is an instance of a physicist”
this is different from regular hyponymy relation because it is not a subcategory it is an example.
Meronymy relation refers to “is a member of”
the (meronym) is a member of (the holonym).
a tree is a member of a forest.
Describe in detail any one similarity metric for comparing two terms,
which is based on the WordNet lexical database.
Path which is based on the number of connections between the two synsets.It is calculated as 1 over the path length + 1.
Wu & Palmer (WuP) Method which is based on the least common subsummer(LCS) which is the clostest common ancestor of the two synsets. It is calculated by 2* LCS depth over synset1 depth plus synset2 depth.
lin method is Information content is based on the probability of encountering an instance of a synset in a specific corpus. It is calculated by 2* LCS information content over synset1 information content plus synset2 information content.
What is meant by metaphoric language and how does metaphoric
language differ from more standard (literal) language? Illustrate your
answer with suitable examples.
Metaphoric language uses words symbolically, unlike literal language which uses words in their standard, direct sense. For example, saying “He has a heart of stone” metaphorically suggests he is unemotional or unsympathetic, while literally it would mean his heart is made of stone. In contrast, a literal statement like “He is unkind” uses words in their usual, direct meaning.
Write down the PageRank function, explaining each of the terms.
Describe the distinction between inlinks and outlink, using an appropriate
diagram to illustrate your answer.
The PageRank of a page is the chance that
a random surfer will land on that page.
PR(A): PageRank of page A. This is what the algorithm is trying to calculate - the ‘importance’ or ‘authority’ of page A on the web.
(1-d): This is a damping factor which is a small constant (usually set around 0.85). It represents the probability of a random user choosing to follow a random link on a web page. The idea is that a user does not always follow outbound links from a given webpage and sometimes jumps to a random page.
d: This is the complementary probability to the damping factor (1-d). It represents the probability of a user following an outbound link from a page. It’s a way to model the behavior of a user sticking to a chain of links.
PR(Ti): PageRank of pages Ti which link to page A. These are the pages that have outbound links pointing to page A. Their PageRank contributes to the PageRank of page A.
C(Ti): The number of outbound links on page Ti. This is used to divide the PageRank of Ti. The idea is that the ‘vote’ of a page is split between all the links it contains; the more links, the less ‘voting power’ each link has.
The sum: The formula sums the contributions of all pages linking to page A. Each contribution is the PageRank of the linking page divided by the number of links on that page.
In PageRank, inlinks are links from other pages to a given page, contributing to its rank. Outlinks are links from that page to others, distributing its PageRank among them. Inlinks boost a page’s rank, while outlinks share it.
(add a diagram)
What is heuristic search? Describe a simple heuristic that might be useful in
solving the Farmer-Fox-Goose problem (or the 8-tile puzzle). Briefly
describe what problem is presented a local minimum and a plateau
To solve problems like the 8-tile puzzle or the Farmer-Fox-Goose problem, a search mechanism typically involves defining a state space, where each state represents a possible configuration of the problem. The search algorithm then systematically explores this space, moving from one state to another via legal moves or actions, until it reaches a goal state.
For example, in the 8-tile puzzle, states represent different tile arrangements, and actions are tile moves. In the Farmer-Fox-Goose problem, states are different distributions of the characters across the river, and actions are the farmer’s choices of who to ferry across the river.
Common search algorithms used include depth-first search, breadth-first search, or more advanced strategies like A* search, which can be guided by heuristics to find the optimal solution efficiently.
What is meant by a zero-sum game? Describe how the MiniMax algorithm
can be used to solve adversary-search problems, such as chess or the OX0
game.
A zero-sum game is a situation in which one participant’s gain or loss is exactly balanced by the losses or gains of the other participants. The MiniMax algorithm is used in zero-sum games like chess or tic-tac-toe to find the optimal move: it evaluates the possible moves and their outcomes to maximize the player’s advantage while minimizing the opponent’s, assuming both players play optimally.
D value of 1.
Describe each of the steps in the A* search algorithm.
Initialization: Start with an open list containing only the initial node and a closed list that is empty.
Loop: Repeat the following steps until the open list is empty:
Choose Node: Select the node with the lowest score from the open list. The score is calculated as
f(n)=g(n)+h(n), where
g(n) is the cost to reach the node, and
h(n) is the estimated cost from the node to the goal (heuristic).
Goal Check: If the chosen node is the goal, backtrace the path from the goal to the start and return this path.
Generate Successors: Identify all the neighbors of the current node and calculate their scores.
Update Lists: For each neighbor, if it is not in the open list, add it. If it is already in the open or closed list with a higher score, update it with the lower score and change its parent to the current node.
Move Node: Move the current node from the open list to the closed list.
Completion: The algorithm terminates when either the goal is found (and the path is returned) or the open list is empty (indicating no path).
What is WordNet and how does it represent information? Make reference to the relationships that are used by WordNet in your answer.
WordNet is a large lexical database of English, linking words into semantic relations including synonyms, antonyms, hypernyms, hyponyms, meronyms, and holonyms. It groups English words into sets of synonyms called synsets, providing short definitions and usage examples, and records the various semantic relations between these synonym sets. The most common relations are:
Synonyms: Words that have the same or similar meanings are grouped into synsets.
Hypernyms: Terms that are more generic or abstract than the given word.
Hyponyms: More specific terms derived from a generic term.
Meronyms: Words that denote a part of something.
Holonyms: Words that denote a whole of which the given word is a part.
Antonyms: Words that have opposite meanings.
WordNet’s structure makes it a useful tool for computational linguistics and natural language processing, providing a rich network of meaningfully related words.
Describe any two similarity metrics used to compare WordNet synsets.
Why would one attempt to quantify the similarity (or difference) between
any two concepts?
Two common similarity metrics used to compare WordNet synsets are:
Path Similarity: Measures the similarity based on the shortest path that connects the synsets in the hypernym/hyponym taxonomy. A shorter path implies higher similarity.
Wu-Palmer Similarity (WUP): Measures the similarity by considering the depths of the two synsets in the WordNet taxonomies, along with the depth of their least common subsumer (most specific ancestor node).
Quantifying the similarity (or difference) between two concepts is important for various natural language processing tasks, such as word sense disambiguation, semantic similarity assessment, information retrieval, and machine translation. It helps in understanding the semantic and contextual relationships between words, thereby improving the performance of these tasks.
Define the terms precision and recall, explaining their relevance to
information retrieval.
In information retrieval, precision and recall are two critical metrics:
Precision: Measures the fraction of retrieved documents that are relevant to the query. It answers the question, “Of all the documents the system retrieved, how many were actually relevant?”
Recall: Measures the fraction of relevant documents that were successfully retrieved. It answers the question, “Of all the relevant documents that exist, how many did the system successfully retrieve?”
These metrics are crucial in evaluating the effectiveness of information retrieval systems, balancing the need for retrieving all relevant documents (recall) while minimizing the retrieval of irrelevant ones (precision).
Write down the PageRank function, explaining each of the terms. Describe
the distinction between inlinks and outlinks, using an appropriate diagram to
illustrate your answer.
The PageRank of a page is the chance that
a random surfer will land on that page.
PR(A): PageRank of page A. This is what the algorithm is trying to calculate - the ‘importance’ or ‘authority’ of page A on the web.
(1-d): This is a damping factor which is a small constant (usually set around 0.85). It represents the probability of a random user choosing to follow a random link on a web page. The idea is that a user does not always follow outbound links from a given webpage and sometimes jumps to a random page.
d: This is the complementary probability to the damping factor (1-d). It represents the probability of a user following an outbound link from a page. It’s a way to model the behavior of a user sticking to a chain of links.
PR(Ti): PageRank of pages Ti which link to page A. These are the pages that have outbound links pointing to page A. Their PageRank contributes to the PageRank of page A.
C(Ti): The number of outbound links on page Ti. This is used to divide the PageRank of Ti. The idea is that the ‘vote’ of a page is split between all the links it contains; the more links, the less ‘voting power’ each link has.
The sum: The formula sums the contributions of all pages linking to page A. Each contribution is the PageRank of the linking page divided by the number of links on that page.
Inlinks: Links from other pages pointing to pi. They contribute to the PageRank of pi as other pages ‘vote’ or ‘endorse’ it.
Outlinks: Links from page pj
pointing to other pages. The PageRank value of pj
is shared among these outlinks.
What problem is presented by encountering a novel word-pair when
calculating the joint probability of a word sequence? How can these
problems be overcome?
Encountering a novel word-pair when calculating the joint probability of a word sequence presents the problem of data sparsity, where the word-pair has not been seen in the training data, leading to a probability estimate of zero. This makes the model unable to handle unseen word combinations effectively.
To overcome these problems, techniques like smoothing and back-off models are used. Smoothing assigns a small probability to unseen word-pairs by redistributing some probability mass from more frequent pairs. Back-off models, on the other hand, fall back to calculating the probability of smaller units (like single words) when the data for larger units (like word pairs) is insufficient or unavailable. These methods help in dealing with the zero-probability issue in sparse data scenarios.
Describe how a search mechanism can be adapted to solve problems like the
8-tile puzzle or the Farmer-Fox-Goose problem.
To solve problems like the 8-tile puzzle or the Farmer-Fox-Goose problem, a search mechanism typically involves defining a state space, where each state represents a possible configuration of the problem. The search algorithm then systematically explores this space, moving from one state to another via legal moves or actions, until it reaches a goal state.
For example, in the 8-tile puzzle, states represent different tile arrangements, and actions are tile moves. In the Farmer-Fox-Goose problem, states are different distributions of the characters across the river, and actions are the farmer’s choices of who to ferry across the river.
Common search algorithms used include depth-first search, breadth-first search, or more advanced strategies like A* search, which can be guided by heuristics to find the optimal solution efficiently.
Describe how the MiniMax algorithm can be used to solve adversary-search
problems, such as chess or the OXO game. What are the main limitations of
the MiniMax algorithm?
The MiniMax algorithm solves adversarial search problems like chess or OXO by simulating all possible moves and their outcomes for both players. It assumes that both players play optimally and alternates between maximizing the player’s advantage and minimizing the opponent’s advantage, thereby choosing the best possible move.
Main limitations of the MiniMax algorithm include:
Computational Complexity: The number of possible moves in games like chess is enormous, leading to a vast search tree that is computationally expensive to explore fully.
Requirement for Effective Heuristics: For deeper search trees, effective heuristics are needed to evaluate game positions and make practical decisions without exploring the entire tree.
D with a value of 3.
Describe how α−β pruning improves the performance of adversary search
algorithms. You may use an appropriate example to illustrate your answer.
Alpha-Beta (α-β) pruning improves the performance of adversarial search algorithms by reducing the number of nodes evaluated in the search tree. It works by eliminating branches that don’t influence the final decision.
In this process, two values, alpha (α) and beta (β), are maintained. Alpha is the best value that the maximizer currently guarantees at that level or above, and beta is the best value that the minimizer currently guarantees. If at any point alpha becomes greater than or equal to beta (α ≥ β), the branch is pruned, as it won’t affect the final outcome.
For example, in a game like chess, if the maximizer has found a move with a value of +3, and on another branch, the minimizer can force a move with a value of -2, the maximizer will never choose this branch, so further exploration of it is unnecessary. This way, α-β pruning skips evaluating many nodes, significantly speeding up the search process.