Learning from Data: Text Flashcards

1
Q

Explain the basic concept of Bag of Words representation. What are its main limitations when dealing with large vocabularies?

A

Bag of Words represents text as word counts. It struggles with large vocabularies due to sparse, high-dimensional vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How would you represent the sentence ‘John likes to watch movies. Mary likes movies too’ using Bag of Words representations?

A

For the vocabulary {John, likes, to, watch, movies, Mary, too}
The vector is: [1, 2, 1, 1, 2, 1, 1].

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do word embeddings improve upon the Bag of Words approach? What advantages do they offer?

A

Word embeddings are dense low-dimensional vectors that capture word meanings and relationships, reducing dimensionality and showing semantic similarity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the concept of the hidden state in RNNs. What is its purpose?

A

The hidden state stores information about previous inputs, providing context for sequential data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is encoder-decoder architecture, and why is it important for NLP tasks?

A

An encoder-decoder uses one network to encode input into a vector and another to decode outputs, enabling tasks like translation or summarization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the basic idea behind attention mechanisms in neural networks?

A

Attention focuses on the most relevant parts of the input sequence for better predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does the attention mechanism help address the limitations of traditional RNNs?

A

Attention allows the model to directly access specific input parts, reducing reliance on a single context vector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Compare and contrast the numerical representations created by Bag of Words versus word embeddings.

A

Bag of Words is sparse and high-dimensional. Word embeddings are dense, low-dimensional, and capture semantic relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do word vectors represent semantic relationships between words?

A

Word vectors place similar words close in space, capturing relationships like ‘king - man + woman ≈ queen.’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain why feed-forward (dense) layers aren’t well-suited for processing text data.

A

Dense layers ignore sequence and context, losing critical word relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The attention mechanisms allow models to ‘correctly associate certain words with other words in a sentence.’ Explain what this means and why it’s important.

A

Attention links related words, ensuring context-aware predictions, crucial for tasks like translation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain how the vectors passed from the encoder to the decoders are created when using attention and when not using attention.

A

Without attention: A single context vector summarizes the input. With attention: Multiple weighted vectors highlight relevant parts dynamically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly