Week 1 Flashcards
What does a computer have to do in order to understand a natural language sentence?
- Lexical analysis (POS tagging)
- Syntactic analysis: For structure of the sentence
- Semantic analysis: Understanding meaning of words using some kind of digital representation
- Inference: Extra knowledge inferred from the original text
- Pragmatic analysis: All text has a reason to be. All human generated text has some objective that can be analyzed
What is ambiguity?
The quality of being open to more than one interpretation; inexactness
Why is natural language processing (NLP) difficult for computers?
Computer don’t have knowledge bases as humans do. Natural language has not been designed for computers. Natural language omits information as humans assume somethings are not necessary to say explicitly.
What is bag-of-words representation? Why do modern search engines use this simple representation of text?
Is representing a document as an unordered set of words. They use this method because it’s often sufficient for most search tasks.
What are the two modes of text information access? Which mode does a web search engine such as Google support?
What are the two modes of text information access? Which mode does a web search engine such as Google support?
When is browsing more useful than querying to help a user find relevant information?
Why is a text retrieval task defined as a ranking task?
What is a retrieval model?
What are the two assumptions made by the Probability Ranking Principle?
What is the Vector Space Retrieval Model? How does it work?
How do we define the dimensions of the Vector Space Model? What does “bag of words” representation mean?
What does the retrieval function intuitively capture when we instantiate a vector space model with bag of words representation and bit representation for documents and queries?
“Bag of words” representation
Push, pull, querying, browsing