chapter 11 Flashcards

Question 1

Q

Natural-language processing (NLP)

Answer

A

getting computers to deal with human language

Question 2

Q

In the 1990s, rule-based NLP approaches were overshadowed by more successful …

Answer

A

statistical approaches, in which massive data sets were employed to train machine-learning algorithms.

Question 3

Q

Most recently, this statistical data-driven NLP approach has focused on

Answer

A

deep learning

Question 4

Q

deep learning’s first major success in NLP

Answer

A

automated speech recognition

Question 5

Q

automated speech recognition is still not at “human level”

Answer

A

Background noise can significantly hurt the accuracy of these systems
these systems are occasionally thrown off by unusual words or phrases in a way that highlights their lack of understanding of the speech they are transcribing.

Question 6

Q

sentiment classification

Answer

A

An AI system that could accurately classify a sentence (or longer passage) as to its sentiment—positive, negative, or some other degree of opinion

Question 7

Q

Some early NLP systems looked for the presence of individual words or short sequences of words as indications of the sentiment of a text.

Answer

A

Looking at single words or short sequences in isolation is generally not sufficient to glean the overall sentiment;

it’s necessary to capture the semantics of words in the context of the whole sentence.

Question 8

Q

recurrent neural networks (RNNs)

Answer

A

inspired by ideas on how the brain interprets sequences

Question 9

Q

key differences between a traditional neural network and a recurrent neural network

Answer

A

for the RNN is that its hidden units have additional “recurrent” connections; each hidden unit has a connection to itself and to the other hidden unit
> most important one

Unlike a traditional neural network, an RNN operates over a series of time steps

At each time step, the RNN is fed an input and computes the activation of its hidden and output units just as does a traditional neural network.
 But in an RNN each hidden unit computes its activation based on both the input and the activations of the hidden units from the previous time step.
 This gives the network a way to interpret the words it “reads” while remembering the context of what it has already “read.”

Question 10

Q

At each time step, the hidden units’ activations constitute:

Answer

A

the network’s encoding of the partial sentence it has seen so far.

The network keeps refining that encoding as it continues to process words.

Question 11

Q

END symbol

Answer

A

After the last word in the sentence, the network is given a special END symbol, which tells the network that the sentence is finished.

 appended by humans to each sentence before the text is fed to the network

Because the network stops encoding the sentence only when it encounters the END symbol, the system can in principle encode sentences of any length into a fixed-length set of numbers

Question 12

Q

NLP output

Answer

A

the output unit in this network processes the hidden units’ activations (the “encoding”) to give the network’s confidence that the input sentence has a positive sentiment.

Question 13

Q

backpropagation

Answer

A

Given a set of sentences that humans have labeled as “positive” or “negative” in sentiment, the encoder network can be trained from these examples via back-propagation.

Question 14

Q

vocabulary of a network

Answer

A

the set of all words that the network will be able to accept as inputs.

Question 15

Q

scheme for encoding words as numbers

Answer

A

assign each word in the vocabulary an arbitrary number between 1 and 20,000.
give the neural network 20,000 inputs, one per word in the vocabulary
At each time step, only one of those inputs—the one corresponding to the actual input word—will be “switched on.” (one-hot encoding)

Question 16

Q

one-hot encoding

Answer

Study These Flashcards

A

at each time step, only one of the inputs—the one corresponding to the word being fed to the network—is “hot” (non-0).

Question 17

Q

problem with one-hot encoding

Answer

Study These Flashcards

A

an arbitrary assignment of numbers to words doesn’t capture any relationships among words

 The inability to capture semantic relationships among words and phrases is a major reason why neural networks using one-hot encodings often don’t work very well.

Question 18

Q

methods for encoding words in a way that would capture such semantic relationships

Answer

Study These Flashcards

A

distributional semantics

Question 19

Q

distributional semantics

Answer

Study These Flashcards

A

the degree of semantic similarity between two linguistic expressions A and B is a function of the similarity of the linguistic contexts in which A and B can appear

o That is, the meaning of a word can be defined in terms of other words it tends to occur with, and the words that tend to occur with those words, and so on

“You shall know a word by the company it keeps.”

Question 20

Q

semantic space

Answer

Study These Flashcards

A

semantic space of words in which words with similar meanings are located closer to one another

because words can have many dimensions of meaning, their semantic space must have more dimensions as well

NLP practitioners often frame the “meaning” of words in terms of geometric concepts
 E.g., three-dimensional space, with x-, y-, and z-axes

The idea here is that once all the words in the vocabulary are properly placed in the semantic space, the meaning of a word can be represented by its location in this space—that is, by the coordinates defining its word vector

Question 21

Q

word2vec

Answer

Study These Flashcards

A

method uses a traditional neural network to automatically learn word vectors for all the words in a vocabulary

The idea is to train the word2vec network to predict what words are likely to be paired with a given input word.

 Each input corresponds to a word in the vocabulary.
 Similarly, there are 700,000 output units, each corresponding to a word in the vocabulary, and a relatively small hidden layer of 300 units

each input has a weighted connection to each hidden unit, and each hidden unit has a weighted connection to each output unit

chapter 11 Flashcards

(21 cards)