chapter 13 Flashcards
Watson on Jeopardy
contained several different natural-language processing methods for parsing the clue, figuring out which words were important, and classifying the clue as to what type of response was needed
ran on specialized parallel computers in order to search rapidly through huge databases of knowledge.
For a given clue, the program produced multiple possible responses and had algorithms for assigning a confidence value to each response. If the highest- confidence response exceeded a threshold, the program buzzed in to give that response.
Fortunately for the Watson team, Jeopardy! fans had long been archiving the complete set of categories, clues, and correct responses from all Jeopardy! games ever broadcast —an invaluable source of examples for the supervised-learning methods used to train many of the system’s components.
The Stanford Question Answering Dataset, or SQuAD
“reading comprehension” test
The SQuAD test is easier than typical reading-comprehension tests given to human readers
No reading between the lines or actual reasoning is necessary. Rather than reading comprehension, this task might be more accurately called answer extraction
The Allen Institute for Artificial Intelligence
developed a collection of elementary- and middle-school multiple-choice science questions.
Correctly answering these questions requires skill that goes beyond mere answer extraction; it also requires an integration of natural-language processing, background knowledge, and commonsense reasoning
neural networks that had outscored humans on the SQuAD questions performanced no better than random guessing on these new questions, even after 8k training set
Winograd schemas
are designed precisely to be easy for humans but tricky for computers.
unlike the Turing test, a test that consists of Winograd schemas forestalls the possibility of a machine giving the correct answer without actually understanding anything about the sentence
> does not allow a subject to hide behind a smokescreen of verbal tricks, playfulness, or canned responses
program that best performs on the winograd schemas
decides on its answer to a Winograd schema puzzle not by understanding the sentences but by examining statistics of subphrases