NLP-4 Flashcards

1
Q

what is bag of word or bow algorithm

A

Bag of Words is a Natural Language Processing model which helps in extracting features out of the
text which can be helpful in machine learning algorithms. In bag of words, we get the occurrences of
each word and construct the vocabulary for the corpus

Bag of Words just creates a set of vectors containing the count of word occurrences in the document (reviews). Bag of Words vectors are easy to interpret.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what does bow algorithm give us

A

we can say that the bag of words gives us two things:
1. A vocabulary of unique words for the corpus
2. The frequency of these words (number of times it has occurred in the whole corpus).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

justify why we call it ‘bag’ of words algorithgm

A

calling this algorithm “bag” of words symbolises that the sequence of sentences or tokens does
not matter in this case as all we need are the unique words and their frequency in it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the steps to implement bow algorithm

A

Text Normalisation: Collect data and pre-process it
2. Create Dictionary: Make a list of all the unique words occurring in the corpus. (Vocabulary)
3. Create document vectors: For each document in the corpus, find out how many times the
word from the unique list of words has occurred.

  1. Create document vectors for all the documents
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

define vector table

A

It is a table containing the frequency of each unique word in the vocabulary in a document.

If the document contains a particular word it is represented by 1 and absence of word is represented by 0 value.n

a document vector table, the header row contains the vocabulary of the corpus and other rows correspond to different documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

define vocabulary

A

It is the collection of all unique words in the corpus along with its frequency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly