chapter 11 Flashcards

1
Q

Natural-language processing (NLP)

A

getting computers to deal with human language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In the 1990s, rule-based NLP approaches were overshadowed by more successful …

A

statistical approaches, in which massive data sets were employed to train machine-learning algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Most recently, this statistical data-driven NLP approach has focused on

A

deep learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

deep learning’s first major success in NLP

A

automated speech recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

automated speech recognition is still not at “human level”

A
  1. Background noise can significantly hurt the accuracy of these systems
  2. these systems are occasionally thrown off by unusual words or phrases in a way that highlights their lack of understanding of the speech they are transcribing.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

sentiment classification

A

An AI system that could accurately classify a sentence (or longer passage) as to its sentiment—positive, negative, or some other degree of opinion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Some early NLP systems looked for the presence of individual words or short sequences of words as indications of the sentiment of a text.

A

Looking at single words or short sequences in isolation is generally not sufficient to glean the overall sentiment;

it’s necessary to capture the semantics of words in the context of the whole sentence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

recurrent neural networks (RNNs)

A

inspired by ideas on how the brain interprets sequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

key differences between a traditional neural network and a recurrent neural network

A

for the RNN is that its hidden units have additional “recurrent” connections; each hidden unit has a connection to itself and to the other hidden unit
> most important one

Unlike a traditional neural network, an RNN operates over a series of time steps

At each time step, the RNN is fed an input and computes the activation of its hidden and output units just as does a traditional neural network.
 But in an RNN each hidden unit computes its activation based on both the input and the activations of the hidden units from the previous time step.
 This gives the network a way to interpret the words it “reads” while remembering the context of what it has already “read.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

At each time step, the hidden units’ activations constitute:

A

the network’s encoding of the partial sentence it has seen so far.

The network keeps refining that encoding as it continues to process words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

END symbol

A

After the last word in the sentence, the network is given a special END symbol, which tells the network that the sentence is finished.

 appended by humans to each sentence before the text is fed to the network

Because the network stops encoding the sentence only when it encounters the END symbol, the system can in principle encode sentences of any length into a fixed-length set of numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

NLP output

A

the output unit in this network processes the hidden units’ activations (the “encoding”) to give the network’s confidence that the input sentence has a positive sentiment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

backpropagation

A

Given a set of sentences that humans have labeled as “positive” or “negative” in sentiment, the encoder network can be trained from these examples via back-propagation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

vocabulary of a network

A

the set of all words that the network will be able to accept as inputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

scheme for encoding words as numbers

A
  1. assign each word in the vocabulary an arbitrary number between 1 and 20,000.
  2. give the neural network 20,000 inputs, one per word in the vocabulary
  3. At each time step, only one of those inputs—the one corresponding to the actual input word—will be “switched on.” (one-hot encoding)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

one-hot encoding

A

at each time step, only one of the inputs—the one corresponding to the word being fed to the network—is “hot” (non-0).

17
Q

problem with one-hot encoding

A

an arbitrary assignment of numbers to words doesn’t capture any relationships among words

 The inability to capture semantic relationships among words and phrases is a major reason why neural networks using one-hot encodings often don’t work very well.

18
Q

methods for encoding words in a way that would capture such semantic relationships

A

distributional semantics

19
Q

distributional semantics

A

the degree of semantic similarity between two linguistic expressions A and B is a function of the similarity of the linguistic contexts in which A and B can appear

o That is, the meaning of a word can be defined in terms of other words it tends to occur with, and the words that tend to occur with those words, and so on

“You shall know a word by the company it keeps.”

20
Q

semantic space

A

semantic space of words in which words with similar meanings are located closer to one another

because words can have many dimensions of meaning, their semantic space must have more dimensions as well

NLP practitioners often frame the “meaning” of words in terms of geometric concepts
 E.g., three-dimensional space, with x-, y-, and z-axes

The idea here is that once all the words in the vocabulary are properly placed in the semantic space, the meaning of a word can be represented by its location in this space—that is, by the coordinates defining its word vector

21
Q

word2vec

A

method uses a traditional neural network to automatically learn word vectors for all the words in a vocabulary

The idea is to train the word2vec network to predict what words are likely to be paired with a given input word.

 Each input corresponds to a word in the vocabulary.
 Similarly, there are 700,000 output units, each corresponding to a word in the vocabulary, and a relatively small hidden layer of 300 units

each input has a weighted connection to each hidden unit, and each hidden unit has a weighted connection to each output unit